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OM nucleic - nucleic search, using sw model 



Run on: December 14, 2003, 13:02:49 ; Search time 104 Seconds 

(without alignments) 
6548.598 Million cell updates/sec 

Title: US-09-8 91-13 8A-1 

Perfect score: 1543 

Sequence: 1 gctcctggcagagttttctg tgcctaaataaatcaatata 1543 



Scoring table: IDENTITYJMUC 

Gapop 10.0 , Gapext 1.0 

Searched: 569978 seqs, 220691566 residues 

Total number of hits satisfying chosen parameters: 1139956 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



Post-processing : 



Minimum Match 0% 
Maximum Match 100% 
Listing first 45 summaries 



Database 



Issued_Patents_NA: * 

1: /cgn2_6/ptodata/l/ina/5A_COMB. seq: * 

2: /cgn2_6/ptodata/l/ina/5B_COMB. seq: * 

3 : /cgn2_6/ptodata/l/ina/6A_COMB. seq: * 

4 : /cgn2_6/ptodata/l/ina/6B_COMB. seq: * 

5: /cgn2_6/ptodata/l/ina/PCTUS_COMB. seq: 

6: /cgn2_6/ptodata/l/ina/backf ilesl . seq: 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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RESULT 1 

US-08-559-524A-1 

; Sequence 1, Application US/08559524A 

; Patent No. 5871963 

; GENERAL INFORMATION: 

APPLICANT: Conley, Pamela B. 
; APPLICANT: Jantzen, Hans-Michael 
; TITLE OF INVENTION: NOVEL PURINERGIC RECEPTOR 

NUMBER OF SEQUENCES: 14 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: MORGAN, LEWIS & BOCKIUS LLP 
STREET: 1800 M Street, N.W. 
CITY: Washington 
STATE: D.C. 
COUNTRY: USA 
ZIP: 20036-5869 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 



COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/559, 524A 
FILING DATE: 15-NOV-1995 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Adler, Reid G. 
REGISTRATION NUMBER: 30,988 

REFERENCE/ DOCKET NUMBER: 04448 1-5010-00-US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 202-4 67-7 000 
TELEFAX: 202-467-7176 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1996 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 

NAME/KEY: CDS 
LOCATION: 625.. 1626 
US-08-559-524A-1 

Query Match 38.2%; Score 589.2; DB 2; Length 1996; 

Best Local Similarity 75.1%; Pred. No. 5.3e-156; 

Matches 762; Conservative 0; Mismatches 248; Indels 4; Gaps 2; 

Qy 39 G C AGAAT GGCACAGAAT T TAT C T T GT GAGAAT T GGT T GG CAAC AG AG G CT AT CT T GAAT A 98 

II MINI I I I I 1 I I I M I I I I I I I I I I I I t 

Db 632 G GAT CAT G GCAT GGAAT G CAAC T T GCAAAAACT GG CT G GCAG CAGAG G C T GC C CT GGAAA 691 

Q y 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

I | | | | | | | | I I I I I I I I I I i i M M I I I I I ! I I I I I M I I I I II 

D b 692 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 751 

Q y 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

Ml | 1 I I I II I I I I I I II I II I I I I I I I I I M I I I I II Ml I I I I I I I 
Db 752 T T GT T GT T T AC GGCT AC AT CTTCTCTCT GAAGAACT G GAAC AGC AGT AAT AT T T ATCT CT 811 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

| | | | | | | || I I I I II I 1 I I Mill I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 812 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 871 

Q y 2 79 AT G C C AAT GAT AAGGGGAC C TAT GGAGAT GT T CT CT GT AT AAGCAAC C GAT AT GT GCTTC 338 

I M I II I I I II IN M I I M I I II Mill M M I I II I I M I M I I I M I I 
Db 872 AT G C C AAT GGAAACT GGAT AT AT GGAGAC GT G CT CT GC AT AAG CAAC C GAT AT GT GCTTC 931 

Qy 339 AC AC CAAC CT C T ACACCAG CAT CCTCTTCCT C ACT TT C AT TAG CAT GGAC C GAT AT CT GC 398 

I I I 1 I I M II I II M II II Mill I II I II II II I I I M M Mill II 
Db 932 AT G C C AAC CT C T AT ACCAGCAT TCTCTTTCT C ACT T TTAT CAGCAT AGAT C GAT ACT T GA 991 

Qy 399 T CAT GAAGT AC C CT TT CCGAGAAC AC T T T CT ACAAAAGAAGGAAT T T G C CAT T T T AAT CT 458 

| || | | M I II I I M I II II M II I I II I I II II M M Mill II II I I II I I 

Db 9 92 TAATTAAGTATCCTTTCCGAGAACACCTTCTGCAAAAGAAAGAGTTTGCTATTTTAATCT 1051 



Qy 4 59 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

| MM I 1 II I II I M I I I I I I I I 1 I I I I I II I I I I I I I I I I 

Db 1052 CCTTGGCCATTTGGGTTTTAGTTyVCCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 1111 

Qy 519 CT GT C C CAAAAGAAGAG G G C AGT AAC T GCAT C GACT AT G CAAGTT C T G GAAAC C CT GAAC 578 

MM II M I I I I I I I I I I I I I I I I I M I I I I I I r I M I I 
Db 1112 CT GT TAT AAC T GACAAT G G CACCAC C T GT AAT GAT T T T G CAAGTT CT G GAGAC C C CAACT 1171 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

MM Mill I I M M II M M M M M I Mill 

Db 1172 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 1231 

Qy 639 T GT GCT T CT T CT AC T ACAAGAT G GT AGT CT T CT T AAAGAG GAGGAGC C AG C AGC AAGCAA 698 

II || M II I I I I I I I I M I I I M II I II II MM III 

Db 1232 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 1291 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

II M II II I M M I I I I I Mill I I I M I M I M II I 

Db 1292 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 1351 

Qy 759 T ACT CTT C ACAC C CT AT CAT AT CAT GC GC AAT T T GAG GAT C G C CT C ACGC CT GGAT AGT T 818 

j M II I I M I M M I I I I II I I I III II M I I I II I I I M I M M I I II I 

Db 1352 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 1411 
Qy 819 G G CCAC AAG GAT GT ACAC AGAAGG C CAT CAAAT CT AT AT AC AC ACT G AC AC GG C CT C 875 

| M I M I M II I I M I I I M M M I II I II 

Db 1412 GGAAGCAGT AT C AGT GC ACT CAGGTCGT CAT CAACT CCTTTTACATTGT GAGAC GGGCTT 1471 

Qy 876 TGGCCTTTCTGAACAGTGCCATC7VATCCCATCTTCTACTTCCTCATGGGAGACCATTACA 935 

Ml M 1 II I M I I II M M II I I II M II I I I II M I M I II I M I II 

Db 1472 TGGGCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 1531 

Qy 936 GAG AG AT GCT GAT T AGT AAGT T C AG AC AAT ACT T C AAGT C C CT T AC AT C C T T CAG GAC AT 995 

I | | M II II II I I I I I I M I M II M I I II M I I I I 1 M I I 

Db 1532 GGGAC AT GCT GAT GAAT CAACT GAG AC AC AACTT CAAAT C C CT T AC AT C CT T T AGC AGAT 1591 

Q y 996 GAG CT GCT GGAT GC AG GT CTT C ACT C AGC CAAAA- T GAGAC ACT T GAT AAACAG 1048 

| | || || | I I I I I I II I I I M I II I II M M I I I I 

Db 1592 GGG C T CAT GAACT C CT ACT T T CAT T C AGAGAAAAGTGAGG G GCT T GT GAAAC AG 1645 



RESULT 2 
US-08-749-707-1 

Sequence 1, Application US/08749707 
Patent No. 6063582 
GENERAL INFORMATION: 

APPLICANT: Conley, Pamela B. 
APPLICANT: Jantzen, Hans-Michael 

TITLE OF INVENTION: NOVEL PURINERGIC RECEPTOR 
NUMBER OF SEQUENCES: 14 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: MORGAN, LEWIS & BOCKIUS LLP 
STREET: 1800 M Street, N.W. 
CITY: Washington 
STATE: D.C. 
COUNTRY: USA 



ZIP: 20036-5869 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
; CURRENT APPLICATION DATA: 

; APPLICATION NUMBER: US/08/749,707 

FILING DATE: 15-NOV-1996 
; CLASSIFICATION: 53 6 

ATTORNEY/AGENT INFORMATION: 

NAME: Adler, Reid G. 

REGISTRATION NUMBER: 30,988 

REFERENCE/ DOCKET NUMBER: 044481-5010-01-US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 202-467-7000 
TELEFAX: 202-467-7176 
INFORMATION FOR SEQ ID NO: 1: 
; SEQUENCE CHARACTERISTICS: 

LENGTH: 1996 base pairs 
; TYPE: nucleic acid 

STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE : 

NAME/KEY: CDS 
LOCATION: 62 5.. 162 6 
US-08-749-707-1 

Query Match 38.2%; Score 589.2; DB 3; Length 1996; 

Best Local Similarity 75.1%; Pred. No. 5.3e-156; 

Matches 762; Conservative 0; Mismatches 248; Indels 4; Gaps 2; 



Qy 39 GC AGAAT G G C AC AGAAT T TAT CT T GT GAGAAT T G GT T G GC AAC AGAGGCT AT CT T GAAT A 98 

II I I I I II I 1 I I I I I I I I I I I I I I I I I I M I I I M I I I I I I 
Db 632 GGAT CAT GGCATG GAAT GCAACTTGCAAAAACTGGCTGGC AG C AGAGGCT GCCCTGGAA^ 691 

Qy 99 AGT ACT AC C T CTCT GC AT T T TAT GC AAT C GAGTT C AT T T T T G GACT G C T T G GGAAT GT C A 158 

I I I I I i I II I I I i I I I I I I I I I I I I M I I I I I I I I I I 1 I I I I II 

Db 692 AGTACTACCTTTCCATTTTTTATGGGATT GAGTT CGTTGTGGGAGTCCTTGGAAAT AC CA 751 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGi\ACAGCAGCAATGTCTATCTTT 218 

I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II! ! MM! I 
Db 7 52 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCT 811 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I I I M I II I I I I I I I I I I I I I I I I I I I I I I i I I I Mill I I M I I I I I I II 

Db 812 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 871 

Qy 27 9 AT G CCAAT GAT AAGGG GAC CT AT G GAGAT GT T CT CT GT AT AAGCAAC C GAT AT GT G CT T C 338 

M I I I I I I I II Ml I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 872 AT GCCAAT GGAAAC T G GAT AT AT G GAGAC GT G CT CT GC AT AAG CAAC CGAT AT GT GCT T C 931 

Qy 339 AC AC CAAC C T CT ACAC C AGCAT CCTCTTCCT C AC T T T C AT T AGC AT GGAC C GAT AT CT GC 398 

I I I I M I I II I I I M I II I I I I I I I I M I I I I II Mill II Mill II 
Db 932 ATGCCAACCTCTATACCAGCATTCTCTTTCTCACTTTTATCAGCATAGATCGATACTTGA 991 



Qy 

Db 



399 T CAT GAAGT AC CCTT T C C GAGAACAC T T T CT ACAAAAGAAG GAAT T T G C CAT T T T AAT CT 4 58 

t It I I I I I I I I I I I I II I I I I I I Mil I I I I I I I 1 II I I I M I i I I I I M I I 
992 TAATTAAGTATCCTTTCCGAGAACACCTTCTGCAAAAGAAAGAGTTTGCTATTTTAATCT 1051 



Qy 4 59 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1052 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 1111 

Qy 519 C T GT CC CAAAAGAAGAG G GCAGT AACT GCAT C GAC T AT GCAAGT T C T GGAAAC C C T GAAC 57 8 

I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I 
Db 1112 CT GT TAT AACT GAC AAT G GCAC C AC CT GT AAT GAT T T T GCAAGT T CT GGAGAC C C C AACT 1171 

Qy 57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I II t I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1172 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 1231 

Qy 63 9 T GT G CT T CT T CT AC T ACAAGAT GGT AGT CT T CT T AAAGAG GAGGAGC C AGCAG CAAG CAA 698 

I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I I I I III 

Db 1232 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 1291 

Qy 69 9 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I II I II I I II M M I I I I I I I I II MM II M II I I I M I I 

Db 1292 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 1351 

Qy 759 T AC T CT T CAC AC C CT AT CAT AT CAT G C G C7\AT T T GAGGAT C G C CT CAC G C CT GGAT AGT T 818 

I II II I I I I I I I I I M I I I II II Ml M I M I II I I M M II i I II I I I I 

Db 1352 TGCTTTTTACACCCTAT CAC GT CAT GCGGAATGT GAGGAT CGCTT CAC GCCTGGGGAGTT 1411 

Qy 819 G GC C ACAAGGAT GT AC AC AGAAG GC C AT CAAAT CT AT AT AC ACAC T GAC AC G GC C T C 8 75 

I II I II I I M I I I II I M I I I I II I II II II M I I 

Db 1412 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGGCTT 1471 

Qy 87 6 TGGCCTTTCT GAAC AGT GC C AT CAAT C C CAT CT T CT ACT T C C T CAT GG GAGAC CAT T ACA 935 

III I I I M I I I II I II I II I I I I II I I I II I I II II II II II I II I II 

Db 14 72 TGGGCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 1531 

Qy 936 GAGAGAT GCT GAT T AGT AAGT T C AGAC AAT ACT T CAAGT C C CT T ACAT CCTT C AGGACAT 995 

I II I I I II II I I I I I Mill II I I M I II II II II I I I II I II I II 
Db 1532 G GGAC AT GCT GAT GAAT CAAC T GAGAC AC AAC TT CAAAT C C CT T AC AT CCTT T AGC AGAT 1591 

Qy 996 GAGCTGCTGGATGCAGGTCTTCACTCAGCCAAAA-TGAGACACTTGATAAACAG 104 8 

I I II III I I I II I II I I II I II M I II I I II I I I 

Db 1592 GGGCTCATGAACTCCTACTTTCATTCAGAGAAAAGTGAGGGGCTTGTGAAACAG 1645 



RESULT 3 

US-09-016-434-1068 

Sequence 1068, Application US/09016434 
Patent No. 6500938 
GENERAL INFORMATION: 

APPLICANT: Janice Au-Young 
APPLICANT: Jeffrey J. Seilhamer 

TITLE OF INVENTION: COMPOSITION FOR THE DETECTION OF SIGNALING 
TITLE OF INVENTION: PATHWAY GENE EXPRESSION 
NUMBER OF SEQUENCES : 14 90 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: INCYTE PHARMACEUTICALS, INC. 



STREET: 3174 PORTER DRIVE 

CITY: PALO ALTO 

STATE : CALI FORNIA 

COUNTRY: USA 
; ZIP: 94304 

COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 
; OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Word Perfect 6.1 for Windows /MS-DOS 6.2 
; CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/016, 434 

FILING DATE: HEREWITH 

CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 
; FILING DATE: 

CLASSIFICATION: 
ATTORNEY/AGENT INFORMATION: 

NAME: Zeller, Karen J. 

REGISTRATION NUMBER: 37,071 

REFERENCE/ DOCKET NUMBER: PA-0002 US 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (650) 855-0555 

TELEFAX: (650) 845-4166 
; INFORMATION FOR SEQ ID NO: 1068: 
; SEQUENCE CHARACTERISTICS : 

; LENGTH: 1429 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: single 

TOPOLOGY: linear 
IMMEDIATE SOURCE: 

LIBRARY: GEN BANK 

CLONE: gll24904 
US-09-016-434-1068 

Query Match 5.7%; Score 88.4; DB 4; Length 1429; 

Best Local Similarity 45.7%; Pred. No. 5.6e-15; 

Matches 385; Conservative 0; Mismatches 4 51; Indels 6; Gaps 2; 



Qy 107 CTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTG 166 

II Ml I i II t ! I I II I III II I I I I I I 

Db 2 92 CTGCCTGTGAGCTATGCAGTTGTCTTTGTGCTGGGCTTGGGCCTTAACGCCCCAACCCTA 351 

Qy 167 TTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTT 226 

I I I. I I I I I I I I I I I I I I I II I III MM 

Db 352 TGGCTCTTCATCTTCCGCCTCCGACCCTGGGATGCAACGGCCACCTACATGTTCCACCTG 411 

Qy 227 TCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTAT GCC 283 

I I I I I I I I M I M II I I I M M I MM Ml 

Db 412 GCATTGTCAGACACCTTGTATGTGCTGTCGCTGCCCACCCTCATCTACTATTATGCAGCC 471 

Qy 284 AAT GAT AAGGGGACCTAT GGAGAT GTT CT CTGTAT AAGCAACC GAT AT GT GCT TCACACC 343 

I | | I I I I I I I I ! I I I II I I I I I I I I I I 

Db 472 CACAACCACTGGCCCTTTGGCACTGAGATCTGCAAGTTCGTCCGCTTTCTTTTCTATTGG 531 

Qy 344 AACCTCTACACCAGCATCCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATG 403 



1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III 

Db 532 AACCTCTACTGCAGTGTCCTTTTCCTCACCTGCATCAGCGTGCACCGCTACCTGGGCATC 5 91 

Qy 4 04 AAGT AC C CTTT C C GAGAAC ACT T T C T AC AAAAGAAG GAAT T T GC C AT T T T AAT CT C GC T G 4 63 

I I I I I M I I I III I Ml Ml 

Db 592 TGCCACCCACTTCGGGCACTACGCTGGGGCCGCCCTCGCCTCGCAGGCCTTCTCTGCCTG 651 

Qy 4 64 GCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTC 52 3 

I I I I II I I I I I I I I II I I I I Mill I I 

Db 652 GCAGTTTGGTTGGTCGTAGCCGGCTGCCTCGTGCCCAACCTGTTCTTTGTCACAACCAGC 711 

Qy 524 CCAAAAGAAGAG G G CAGT AACT GC AT C GAC TAT GCAAGT T C T GGAAAC C CT GAAC ACAAT 58 3 

I II I I III III I I I I I I I I I I I I I I I 

Db 712 AACAAAG G GAC C AC C GT C CT GT GC CAT GAC AC C ACT C GGC C T GAAGAGT T T GAC C AC TAT 771 

Qy 58 4 CTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGC 64 3 

I I I I I I II I I I I I 1 III I III 

Db 772 GTGCACTTCAGCTCGGCGGTCATGGGGCTGCTCTTTGGCGTGCCCTGCCTGGTCACTCTT 8 31 

Qy 64 4 TTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCC 7 03 

I I I I I MM I I I I I Ml I 

Db 8 32 GTTTGCTATGGACTCATGGCTCGTCGCCTGTATCAGCCCTTGCCAGGCTCTGCACAGTCG 891 

Qy 7 04 CTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTC 7 63 

II I I I I II II I I I I I II II I I I I 

Db 8 92 TCTTCTCGCCTCCGCTCTCTCCGCACCATAGCTGTGGTGCTGACTGTCTTTGCTGTCTGC 951 

Qy 764 TT C ACAC C C TAT CAT AT CAT G C GCAAT T T GAG GAT C G C CT C ACGC C T G G AT AGT T GGC C A 823 

Ml II I II II II II M I I ' I I II I I 
Db 952 TTCGTGCCTTTCCACATCACCCGCACCATTTACTACCTGGCCAGGCTGTTGGAA GCT 1008 

Qy 82 4 CAAG GAT GT ACAC AGAAGGC CAT CAAAT CT AT AT AC AC ACT GAC AC GGCCTCTGGCCTTT 883 

I I i II I M I II I I I I I I I I I I I I I M I M M I I 

Db 1009 GACTGCCGAGTACTGAACATTGTCAACGTGGTCTATAAAGTGACTCGGCCCCTGGCCAGT 1068 

Qy 884 CT GAAC AGT GC CAT CAAT CC C AT CT T CT ACT T C C T CAT GG GAGAC CAT T ACAGAGAGAT G 943 

I I I I I II MM I II II I I I II M I I I I I I I I II I 
Db 1069 GCCAACAGCTGCCTGGATCCTGTGCTCTACTTGCTCACTGGGGACAAATATCGACGTCAG 1128 

Qy 944 CT 945 

I I 

Db 1129 CT 1130 



RESULT 4 

US-09-016-434-1456 

Sequence 1456, Application US/09016434 
Patent No. 6500938 
GENERAL INFORMATION: 

APPLICANT: Janice Au-Young 
APPLICANT: Jeffrey J. Seilhamer 

TITLE OF INVENTION: COMPOSITION FOR THE DETECTION OF SIGNALING 
TITLE OF INVENTION: PATHWAY GENE EXPRESSION 
NUMBER OF SEQUENCES: 14 90 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: INCYTE PHARMACEUTICALS , INC. 
STREET: 3174 PORTER DRIVE 



CITY: PALO ALTO 

STATE: CALIFORNIA 

COUNTRY: USA 

ZIP: 94304 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS /MS-DOS 
; SOFTWARE: Word Perfect 6.1 for Windows/MS-DOS 6.2 

CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/ 0 9/ 016, 4 34 

; FILING DATE: HEREWITH 

CLASSIFICATION: 
; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 
; FILING DATE: 

; CLASSIFICATION: 

ATTORNEY/AGENT INFORMATION: 

NAME: Zeller, Karen J. 

REGISTRATION NUMBER: 37,071 

REFERENCE/ DOCKET NUMBER: PA- 00 02 US 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (650) 855-0555 

TELEFAX: (650) 845-4166 
INFORMATION FOR SEQ ID NO: 1456: 
SEQUENCE CHARACTERISTICS : 
; LENGTH: 3055 base pairs 

; TYPE: nucleic acid 

STRANDEDNESS: single 
; TOPOLOGY: linear 

; IMMEDIATE SOURCE: 

LIBRARY: GEN BANK 

CLONE: g798835 
US-09-016-434-14S6 



Query Match 5.6%; Score 86.4; DB 4; Length 3055; 

Best Local Similarity 46.1%; Pred. No. 3.1e-14; 

Matches 402; Conservative 0; Mismatches 461; Indels 9; Gaps 3; 

Qy 8 0 ACAGAG G CT AT C T T GAATAAGT AC T AC CT CT C T G CAT T TT AT GCAAT C GAGTT CAT T TT T 13 9 

II It I III I I I I I I I I I III III II I i I I t I 

Db 982 ACCAAGACGGGCTTCCAGTTTTACTACCTGCCGGCTGTCTACATCTTGGTATTCATCATC 1041 



Qy 14 0 GGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTTCTGCATG7\AG7\ACTGGAAC 199 

IE I 1 I I I I I I I II I I I II I I I I I I I I I I I I I I I 

Db 1042 GGCTTCCTGGGCAACAGCGTGGCCATCTGGATGTTCGTCTTCCACATGAAGCCCTGGAGC 1101 



Qy 2 00 AGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTT 259 

I I I I I I I I I I I II I I I I I I I I I Mil II ! I 

Db 1102 GGCATCTCCGTGTACATGTTCAATTTGGCTCTGGCCGACTTCTTGTACGTGCTGACTCTG 1161 



Qy 2 60 C C CAT C CT GAT AAAGAGT TAT G C C AAT GAT A AGGGGACCTATGGAGATGTTCTCTGT 316 

II I I I I I I II I I I I I I I I I I I I 11 I I I I I I I I 

Db 1162 CCAGCCCTGATCTTCTACTACTTCAATAAAACAGACTGGATCTTCGGGGATGCCATGTGT 1221 

Qy 317 ATAAGCAACCGATATGTGCTTCACACC7\ACCTCTACACCAGCATCCTCTTCCTCACTTTC 37 6 

II I II I I I I I I II I I I I I I I I I I I I I I I I I I I 



Db 



1222 AAACTGCAGAGGTTCATCTTTCATGTGAACCTCTA TGGCATCTTGTTTCTGACATGC 1278 



Qy 377 AT TAG CAT GGAC C GAT AT CT GCT CAT GAAGT AC C C T T T CC GAGAACACT TT CT ACAAAAG 4 36 

I I I 1 II I I I I II MINI II I I I I I 

Db 127 9 ATCAGTGCCCACCGGTACAGCGGTGTGGTGTACCCCCTC7\AGTCCCTGGGCCGGCTCAAA 1338 

Qy 437 AAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTA 496 

I I I I I I I I I I III II I I I I I I I I I I II I 

Db 1339 AAGAAGAATGCGATCTGTATCAGCGTGCTGGTGTGGCTCATTGTGGTGGTGGCGATCTCC 13 98 

Qy 4 97 CCCATGCTCACTTTCATCAATTCT GT C C C AAAAGAAGAG GG C AGT AACT G CAT C GAC 553 

M I I I I I I II II I I I I I I I I I II 1 I I I I I I I 

Db 1399 CCCATCCTCTTCTACTCAGGTACCGGGGTCCGCTVAAAACAAAACCATCACCTGTTACGAC 14 58 

Qy 554 TATGCAAGTTCTGGAAACCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTG 613 

I I III I I I I I I I I I I I I I I I I i III II 

Db 14 59 AC C AC CT C AGAC GAGT AC CT GC GAAGT TAT T T CAT CT ACAG CAT GT G C AC GAC CGT GGC C 1518 

Qy 614 GGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTA 67 3 

Ml till I II I I I I I I I I I I II 

Db 1519 ATGTTCTGTGTCCCCTTGGTGCTGATTCTGGGCTGTTACGGATTAATTGTGAGAGCTTTG 157 8 

Qy 674 AAGAG GAG G AGC C AGC AG CAAGCAAC T G C C C T GC C AC T GGACAAAC C C C AAC GCCT GGT G 733 

I I I I I I I I I II I II I I 

Db 1579 AT T T ACAAAGAT CT GGAC AACT CT C C T CT GAGGAGAAAAT C GAT TT AC CT GGT AAT CAT T 1638 

Qy 734 GT C CT GGC GGT T GT GAT C T T CT C TAT AC T C T T CACAC C C TAT CAT AT CAT GC GCAAT T T G 793 

! I I I I I I I I I I I I I II I III I I I I I 

Db 1639 GTACTGACTGTTTTTGCTGTGTCTTACATCCCTTTCCATGTGATGAAAACGATGAACTTG 1698 

Qy 7 94 AGGAT C GC CT C AC GC C T GGAT AGT T G GC C ACAAGG AT GT ACACAGAAG G C CAT CAAAT CT 853 

I I I I III I I I I II III 

Db 1699 AGGGCCCGGCTTGATTTTCAGACCCCAGCAATGTGTGCTTTCAATGACAGGGTTTATGCC 17 58 

Qy 854 AT AT AC AC ACT GACAC GGCCTCTGGCCTTTCT GAAC AGT G C CAT CAAT C C CAT CT T CT AC 913 

I II I I I I I I I I I I I I I I I I I I I I I I I I I II I II I 

Db 17 59 ACGTATCAGGTGACAAGAGGTCTAGCAAGTCTCAACAGTTGTGTGGACCCCATTCTCTAT 1818 

Qy 914 TT CCT CAT GGGAGACCATTACAGAGAGAT GCT 945 

II I 1 i I I I I I I I I I I I II II 

Db 1819 T T CT T G GC G GGAGAT AC T TT C AGAAGGAGACT 1850 



RESULT 5 

US-09-016-434-1482 

; Sequence 1482, Application US/09016434 

; Patent No. 6500938 

; GENERAL INFORMATION : 

APPLICANT: Janice Au-Young 

APPLICANT: Jeffrey J. Seilharaer 

TITLE OF INVENTION: COMPOSITION FOR THE DETECTION OF SIGNALING 
TITLE OF INVENTION: PATHWAY GENE EXPRESSION 
NUMBER OF SEQUENCES : 1490 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: INCYTE PHARMACEUTICALS, INC. 

STREET: 317 4 PORTER DRIVE 

CITY: PALO ALTO 



STATE : CALIFORNIA 
; COUNTRY: USA 

ZIP : 94304 
COMPUTER READABLE FORM: 
; MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 
; OPERATING SYSTEM: PC-DOS/MS-DOS 

; SOFTWARE: Word Perfect 6.1 for Windows/MS-DOS 6.2 

CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/ 09/ 01 6, 434 

FILING DATE: HEREWITH 

CLASSIFICATION: 
PRIOR APPLICATION DATA: 
; APPLICATION NUMBER: 

FILING DATE: 

CLASSIFICATION: 
; ATTORNEY/AGENT INFORMATION: 

; NAME: Zeller, Karen J. 

REGISTRATION NUMBER: 37,071 

REFERENCE/ DOCKET NUMBER: PA-0002 US 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (650) 855-0555 

TELEFAX: (650) 845-4166 
; INFORMATION FOR SEQ ID NO: 14 82: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 2025 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: single 

TOPOLOGY: linear 
IMMEDIATE SOURCE: 

LIBRARY: GENBANK 
; CLONE: g984506 

US-09-016-434-1482 

Query Match 5.5%; Score 85.4; DB 4; Length 2025; 

Best Local Similarity 46.5%; Pred. No. 4.7e-14; 

Matches 389; Conservative 0; Mismatches 436; Indels 12; Gaps ,3; 
Qy 91 CTTGAATAAGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGG 150 



Db 



335 CTTCAAGTACGTGCTGCTGCCTGTGTCCTACGGCGTGGTGTGCGTGCTTGGGCTGTGTCT 394 



Db 



151 GAATGTCACTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGT 210 

III I I I I I I I I I I I I I I I I I I I I I I I I I I II 

395 GAACGCCGTGGCGCTCTACATCTTCTTGTGCCGCCTCAAGACCTGGAATGCGTCCACCAC 4 54 



Qy 



211 CTATCTTTTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGAT 2 70 



Db 



4 55 ATATATGTTCCACCTGGCTGTGTCTGATGCACTGTATGCGGCCTCCCTGCCGCTGCTGGT 514 



Qy 



Db 



271 AAAGAGT T AT G C C AAT GAT7\AG GGGAC C TAT GGAGAT GTT CT C T GT AT AAG C AAC C G 327 

I II I I I I I I I I I I I I I I I I I I I I ! I 

515 CTATTACTACGCCCGCGGCGACCACTGGCCCTTCAGCACGGTGCTCTGCAAGCTGGTGCG 574 



Qy 

Db 



328 
575 



ATATGTGCTTCACACCAACCTCTACACCAGCATCCTCTTCCTCACTTTCATTAGCATGGA 3 87 

I I I I I I I I 1 I I I I I I! I II I I f I I I I I I I I I I I I I I I I I I I I I I 

CTTCCTCTTCTACACCAACCTTTACTGCAGCATCCTCTTCCTCACCTGCATCAGCGTGCA 634 



Qy 38 8 CC GAT AT CT G CT CAT GAAGT AC C CTTT C C GAGAAC ACT T T CT AC AAAAGAAGGAAT T T GC 44 7 

Mi I I I I I I I I I I I I I I I I I I 

Db 635 CCGGTGTCTGGGCGTCTTACGACCTCTGCGCTCCCTGCGCTGGGGCCGGGCCCGCTACGC 694 

Qy 448 CATTTTAATCTCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCAC 507 

I 1 I I I II ! I I I I I I I I I I I I I I I I 

Db 695 TCGCCGGGTGGCCGGGGCCGTGTGGGTGTTGGTGCTGGCCTGCCAGGCCCCCGTGCTCTA 754 

Qy 508 TTTCATCAATTCTGTCCCAAAAGAAGAGGGCAGTAACTGCATCGACTATGCAAGTTCTGG 567 

I I I I I I II II II I I I I I I I I I I II 

Db 7 55 CTTTGTCACCACCAGCGCGCGCGGGGGCCGCGTAACCTGCCACGACACCTCGGCACCCGA 814 

Qy 568 7WVCCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCC 627 

I II I I f I I I I I I I I I I I I I III 

Db 815 GCTCTTCAGCCGCTTCGTGGCCTACAGCTCAGTCATGCTGGGCCTGCTCTTCGCGGTGCC 87 4 

Qy 62 8 TCTCTCTGTGATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCA 68 7 . 

I I I I I I IN III I I 1 I I II II 

Db 875 CTTTGCCGTCATCCTTGTCTGTTACGTGCTCATGGCTCGGCGACTGCTAAAGCCAGCCTA 934 

Qy 688 GCAGCAAGCAACTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGT 747 

I ! I I I I I I I I I I I I I I III I I I I I I I I 

Db 935 CGGGACCTCGGGCGGCCTCCCTAGGGCCAAGCGC7\AGTCCGTGCGCACCATCGCCGTGGT 994 

Qy 74 8 G AT C TT CT CT AT ACT C T T C AC AC C CT AT CAT AT C AT GC G CAAT T T GAGGAT C GC 801 

I I I I I I I I I I I I III II III MM I II 

Db 995 GCTGGCTGTCTTCGCCCTCTGCTTCCTGCCATTCCACGTCACCCGCACCCTCTACTACTC 1054 

Qy 8 02 CT CACGC CT GGATAGTT GGC CACAAGGAT GTACACAGAAGGCCAT CAAAT CTATATACAC 861 

Mllll I I M I II I I I I II I I I I I I 

Db 1055 CTTCCGCTCGCTGG AC CT C AG CT G C C AC AC C C T C AAC G C CAT C AAC AT GG C C T AC AA 1111 

Qy 8 62 ACTGACACGGCCTCTGGCCTTTCTGAACAGTGCCATCAATCCCATCTTCTACTTCCT 918 

I II M II I I II I II I II I I I I I I I III I II I I I I I I I I 

Db 1112 GGTTACCCGGCCGCTGGCCAGTGCTAACAGTTGCCTTGACCCCGTGCTCTACTTCCT 1168 



RESULT 6 

US-09-016-434-1108 

Sequence 1108, Application US/09016434 
Patent No. 6500938 
GENERAL INFORMATION: 

APPLICANT: Janice Au-Young 
APPLICANT: Jeffrey J. Seilhamer 

TITLE OF INVENTION: COMPOSITION FOR THE DETECTION OF SIGNALING 
TITLE OF INVENTION: PATHWAY GENE EXPRESSION 
NUMBER OF SEQUENCES: 14 90 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: INCYTE PHARMACEUTICALS, INC. 
STREET: 3174 PORTER DRIVE 
CITY: PALO ALTO 
STATE: CALIFORNIA 
COUNTRY: USA 
ZIP: 94304 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 



COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Word Perfect 6.1 for Windows/MS-DOS 6.2 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/016,434 
FILING DATE: HEREWITH 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 
APPLICATION NUMBER: 
FILING DATE: 
CLASSIFICATION: 
ATTORNEY/AGENT INFORMATION: 
NAME: Zeller, Karen J. 
REGISTRATION NUMBER: 37,071 
REFERENCE/ DOCKET NUMBER: PA-0002 US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (650) 855-0555 
TELEFAX: (650) 845-4166 
INFORMATION FOR SEQ ID NO: 1108: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1571 base pairs 
TYPE: nucleic acid 
STRANDEDNESS: single 
TOPOLOGY: linear 
IMMEDIATE SOURCE: 
LIBRARY: GENBANK 
CLONE: gl296659 
US-09-016-434-1108 

Query Match 5.4%; Score 82.8; DB 4; Length 1571; 

Best Local Similarity 46.2%; Pred. No. 2.2e-13; 

Matches 390; Conservative 0; Mismatches 442; Indels 12; Gaps 3; 

Qy 8 9 ATCTTGAATAAGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTT 148 

I I I I I I I II I I I 111 I III M I t I I I 

Db 343 AACTTCAAGCAACTGCTGCTGCCACCTGTGTATTCGGCGGTGCTGGCGGCTGGCCTGCCG 4 02 

Qy 149 GGGAATGTCACTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAAT 2 08 

IN II II I t I II II I I I I I I I I 

Db 4 03 CTGAACATCTGTGTCATTACCCAGATCTGCACGTCCCGCCGGGCCCTGACCCGCACGGCC 4 62 

Qy 2 09 GTCTATCTTTTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTG 2 68 

I ! II I I I I I I I I' I I I II I I I M I I I 11 I I I I I I 

Db 4 63 GTGTACACCCTAAACCTTGCTCTGGCTGACCTGCTATATGCCTGCTCCCTGCCCCTGCTC 522 

Qy 2 69 AT AAAGAGT T AT G C C AA T GAT AAGG GGAC CT AT G GAGAT GT T C T CT GT AT AAG CAAC 325 

II II I I I I I I I I I I I I I I I I I I I I II I III I 

Db 523 ATCTACAACTATGCCCAAGGTGATCACTGGCCCTTTGGCGACTTCGCCTGCCGCCTGGTC 582 

Qy 32 6 CGATATGTGCTTCACACCAACCTCTACACCAGCATCCTCTTCCTCACTTTCATTAGCATG 385 

III I I I I I I I I I I II I I M I I I I I 1 I I I I I 1 I I I 1 I I M I I 

Db 583 CGCTTCCTCTTCTATGCC7UVCCTGCACGGCAGCATCCTCTTCCTCACCTGCATCAGCTTC 642 

Qy 386 G AC C GAT AT C T G C T CAT G AAGT AC C C T T T C C GAGAACACT T T CT AC AAAAGAAG GAA .4 42 

I I I II i I I Ml I I I I I II III I 

Db 643 CAGCGCTACCTGGGCATCTGCCACCCGCTGGCCCCCTGGCACAAACGTGGGGGCCGCCGG 7 02 



Qy 443 TTTGCCATTTTAATCTCGCTGGCTGTC'TGGGCCTTAGTGACCTTAGAAGTTCTACCCATG 502 

Mil I I I 1 I 1 I I I I I I I I 1 I I I I I M I I 

Db 703 GCTGCCTGGCTAGTGTGTGTAGCCGTGTGGCTGGCCGTGACAACCCAGTGCCTGCCCACA 7 62 

Qy 503 CTCACTTTCATCAATTCTGTCCC7\A7\AGAAGAGGGCAGTAACTGCATCGACTATGCAAGT 562 

I I I I I III I I I I I I I I I I Ml 

Db 763 GCCATCTTCGCTGCCACAGGCATCCAGCGTAACCGCACTGTCTGCTATGACCTCAGCCCG 822 

Qy 563 TCTGGAAACCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTA 622 

Ml I II I I I I II II I II I I I I I II II II I I 

Db 823 CCTGCCCTGGCCACCCACTATATGCCCTATGGCATGGCTCTCACTGTCATCGGCTTCCTG 8 82 

Qy 623 ATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGG 682 

I II I I I I II I I M I I I III III I I 

Db 883 CTGCCCTTTGCTGCCCTGCTGGCCTGCTACTGTCTCCTGGCCTGCCGCCTGTGCCGCCAG 942 

Qy 683 AGC C AG C AG C AAG C AAC T G CCCTGCCACTGGACAAACCCCAACGCCTGGTGGTC 736 

II II III II I I III - I I I I I I I II 

Db 94 3 GATGGCCCGGCAGAGCCTGTGGCCCAGGAGCGGCGTGGCAAGGCGGCCCGCATGGCCGTG 10 02 

Qy 737 CTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATATCATGCGCAATTTGAGG 796 

I I I I I II II I III MM II I I M I I II I 

Db 1003 GTGGTGGCTGCTGCCTTTGCCATCAGCTTCCTGCCTTTTCACATCACCAAGACAGCCTAC 1062 

Qy 7 97 AT C G C CT C AC G C CT G GAT AGT T GG C CACAAGGAT GT AC AC AGAAGG C CAT CAAAT CT AT A 8 56 

III Mil I I I I I I I I M I I II 

Db 1063 CTGGCAGTGCGCTCGACGCCGGGCGTCCCCTGCACTGTATTGGAGGCCTTTGCAGCGGCC 1122 

Qy 857 TACACACTGACACGGCCTCTGGCCTTTCTGAACAGTGCCATCAATCCCATCTTCTACTTC 916 

I I I I I II II II I I I I I I I I I II I I I I II I II I I I I I I 

Db 112 3 TACAAAGGCACGCGGCCGTTTGCCAGTGCCAACAGCGTGCTGGACCCCATCCTCTTCTAC 1182 

Qy 917 CTCA 920 

I I I 

Db 1183 TTCA 1186 



RESULT 7 

US-08-405-271A-18 

; Sequence 18, Application US/08405271A 

; Patent No. 6432652 

; GENERAL INFORMATION: 

APPLICANT: EVANS , CHRISTOPHER J. 

APPLICANT: KEITH, DUANE E. 

TITLE OF INVENTION: OPIOID RECEPTOR GENES 
NUMBER OF SEQUENCES: 25 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: MORRISON & FOERSTER 

STREET: 2000 PENNSYLVANIA AVENUE, NW, Suite 5500 
CITY: WASHINGTON 
STATE: DC 
COUNTRY: USA 
ZIP: 20006-1888 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 



SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/405, 271A 

FILING DATE: 14-MAR-1995 

CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION : 

NAME: MURASHIGE, KATE H. 

REGISTRATION NUMBER: 29,959 

REFERENCE/ DOCKET NUMBER: 22000-20526.22 
; TELECOMMUNICATION INFORMATION: 

; TELEPHONE: (202) 887-1500 

; TELEFAX: (202) 887-0763 

TELEX: 90-4030 MRSNFOERSWSH 
; INFORMATION FOR SEQ ID NO: 18: 
; SEQUENCE CHARACTERISTICS: 

LENGTH: 1805 base pairs 

TYPE: nucleic acid 

STRANDEDNESS : double 

TOPOLOGY: linear 
FEATURE : 
; NAME/KEY: CDS 

LOCATION: 10. . 1119 
US-08-405-271A-18 



Query Match 5.3%; Score 82.2"; DB 4; Length 1805; 

Best Local Similarity 44.5%; Pred. No. 3.5e-13; 

Matches 37 9; Conservative 0; Mismatches 4 63; Indels 9; Gaps 1; 

Qy 85 G G CT AT CT T GAAT AAGT AC T AC C T CT CT GC AT TT T AT G CAAT C GAGT T CAT T TT T G GACT 144 

I I I I I III I III I III I I I I I I I I I 

Db 147 GCCCCTCGGGCTCAAGGTCACCATCGTGGGGCTCTACCTGGCCGTGTGTGTCGGAGGGCT 206 

Qy 145 GCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAG 204 

I I I I I I I I ill I I I I I I I I I I • II III II II 

Db 2 07 C CT G G G GAAC TGCCTTGT CAT GT AC GT CAT C CT C AG GCAC AC CAAAAT GAAGAC AGC CAC 266 

Qy 2 05 CAATGTCTATCTTTTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCAT 2 64 

II I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I M I 

Db 2 67 CAATATTTACATCTTTAACCTGGCCCTGGCCGACACTCTGGTCCTGCTGACGCTGCCCTT 326 

Qy 2 65 CCT GATAAAGAGT TAT GC CAAT GATAAGGGGACCT AT GGAGAT GTT CT CT GTATAAGCAA 324 

III II I I II I I III III II II I I 

Db 327 CCAGGGCACGGACATCCTCCTGGGCTTCTGGCCGTTTGGGAATGCGCTGTGCAAGACAGT 3 86 



Qy 325 C C GAT AT GT GCT T CAC AC CAAC C T CT AC ACCAGC AT CCTCTTCCT C ACT T T CAT TAG CAT 384 

I I II I II I I I I I II II I I I I I I II I I I I I I I I I 

Db 3 87 CAT T G C CAT T GAC T ACT AC AAC AT GT T CAC CAG C AC CT T CAC C CT AAC T GC CAT GAGT GT 44 6 

Qy 3 85 G GAC C GAT AT C T G C T CAT GAAGT AC C CT T T C C GAGAAC AC T T T CT AC AAAAG7VAG GAAT T 44 4 

I I I I I I I I I III I I I I I I I I I II II II 

Db 4 47 GGATCGCTATGTAGCCATCTGCCACCCCATCCGTGCCCTCGACGTCCGCACGTCCAGCAA 506 



Qy 4 45 TGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCT 504 

III II I I I I I I I I I I I I I I I II I I I I I I I 

Db '507 AGCCCAGGCTGTCAATGTGGCCATCTGGGCCCTGGCCTCTGTTGTCGGTGTTCCCGTTGC 566 



Qy 



5 05 CAC T T T CAT CAAT T C T GT C C CAAAAGAAGAGGG C AGT AAC T G CAT C GACT AT GCAAGTT C 564 



II I I I I II II I I I III I I I II 

Db 567 CAT CAT GGGCTCGGCACAGGT C GAGGAT GAAGAGAT CGAGT GC CT GGTGGAGAT CC CTAC 62 6 

Qy 565 TGGAAACCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAAT 62 4 

I II I II I I I I M I III I I I I i I 

Db 627 CCCTCAGGATTACTGGGGCCCGGTGTTTGCCATCTGCATCTTCCTCTTCTCCTTCATCGT 68 6 

Qy 625 TCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAG 68 4 

III I I I I I III MM I I I I I I I I 

Db 687 CCCCGTGCTCGT CATCTCTGTCTGCTACAGCCTCATGATCCGGCGGCTCCGTGGAGTCCG 74 6 

Qy 685 CCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGT 74 4 

I I I I I 1 I I I I I I I I I I I I I I I 

Db 74 7 CCTGCTCTCGGGCTCCCGAGAGAAGGACCGGAACCTGCGGCGCATCACTCGGCTGGTGCT 8 06 

Qy 74 5 T GT GAT C T T C T C TAT ACT C T T C ACAC C CTAT CAT AT CAT G C GCAAT T T GAG GAT C G C CT C 804 

III I I II I II I II II II I I I I I 

Db 807 GGTGGTAGTGGCTGTGTTCGTGGGCTGCTGGACGCCTGTCCAGGTCTTCGTGCTGGCCCA 866 

Qy 805 ACGCCT GGATAGTT GGCCACAAGGAT GTACACAGAAGGCCAT CAAAT CTATATACACACT 864 

I I I I I I I 1 I i I II MINI I I I I I 
Db 867 AGGGCTGGGGGTTCAGCCGAGCAGCGAGACTGCCGTGGCCATTCTGCGCTTCTGCAC 92 3 

Qy 865 GACACGGCCTCTGGCCTTTCTGAACAGTGCCATCAATCCCATCTTCTACTTCCTCATGGG 92 4 

! I I II I II I I I I I I I I I I I I I I I M I I I I I I II I I I 
Db 924 GGCCCTGGGCTACGTCAACAGCTGCCTCAACCCCATCCTCTACGCCTTCCTGGA 97 7 

Qy 925 AGACCATTACA 935 

II llll 

Db 97 8 T GAGAACT T C A 98 8 



RESULT 8 

US-09-016-434-1391 

Sequence 1391, Application US/09016434 
Patent No. 6500938 
GENERAL INFORMATION: 

APPLICANT: Janice Au-Young 
APPLICANT: Jeffrey J. Seilhamer 

TITLE OF INVENTION: COMPOSITION FOR THE DETECTION OF SIGNALING 
TITLE OF INVENTION: PATHWAY GENE EXPRESSION 
NUMBER OF SEQUENCES : 14 90 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: INCYTE PHARMACEUTICALS, INC. 
STREET: 3174 PORTER DRIVE k 
CITY: P7VLO ALTO 
STATE: CALIFORNIA 
COUNTRY: USA 
ZIP: 94304 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Word Perfect 6.1 for Windows /MS-DOS 6.2 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/016, 434 
FILING DATE: HEREWITH 



CLASSIFICATION: 
PRIOR APPLICATION DATA: 
APPLICATION NUMBER: 
FILING DATE: 
CLASSIFICATION: 
ATTORNEY/AGENT INFORMATION: 
NAME: Zeller, Karen J. 
REGISTRATION NUMBER: 37,071 
REFERENCE/DOCKET NUMBER: PA-0002 US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (650) 855-0555 
TELEFAX: (650) 845-4166 
INFORMATION FOR SEQ ID NO: 1391: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1973 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
IMMEDIATE SOURCE: 
LIBRARY: GENBANK 
CLONE: g4 71316 
US-09-016-434-1391 

Query Match 5.3%; Score 82.2; DB 4; Length 1973; 

Best Local Similarity 44.5%; Pred. No. 3.7e-13; 

Matches 379; Conservative 0; Mismatches 463; Indels 9; Gaps 1; 

Qy 85 GGCTATCTTGAAT7\AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACT 14 4 

I f i I I I I I I I I I I III II II I It I I 

Db 315 GCCCCTCGGGCTCAAGGTCACCATCGTGGGGCTCTACCTGGCCGTGTGTGTCGGAGGGCT 374 

Qy 145 GCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAG 204 

II I I II I I I 'I I I I I I I I I I I II Ml II I I 

Db 375 C CT G G G GAACT G C CT T GT C AT GT AC GT CAT C CT CAGG C AC AC CAAAAT GAAGAC AG C C AC 434 

Qy 2 05 CAATGTCTATCTTTTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCAT 2 64 

I I I I I I I I I I I I I I I I Ml Mil I I I II I M I 1 I I I I 

Db 435 CAATATTTACATCTTTAACCTGGCCCTGGCCGACACTCTGGTCCTGCTGACGCTGCCCTT 494 

Qy 2 65 C CT GAT AAAGAGT TAT GCCAAT GAT AAGGGGAC C TAT GGAGAT GT T C T CT GT AT AAGCAA 32 4 

III II I I I I I I I II I II I I II I I 

Db 4 95 CCAGGGCACGGACATCCTCCTGGGCTTCTGGCCGTTTGGGAATGCGCTGTGCAAGACAGT 554 

Qy 325 C C GAT AT GT GC T T C AC AC CAACCT CT ACACCAGC AT CCTCTTCCT C AC T T T CAT T AGC AT 38 4 

I I II I II I I I II I I I II I I I I M I I I I I I I II I 

Db 555 CAT T GC CAT TGAC T ACT AC AAC AT GTT CACC AG CAC C T T C ACC CT AAC T G C CAT GAGT GT 614 

Qy 385 G GAC C GAT AT C T G CT CAT G AAGT AC C CT T T C C GAG AAC AC T T T C T AC AAAAGAAG G AAT T 444 

I I I II I I I I III I I I I I I II I II II II 

Db 615 GGATCGCTATGTAGCCATCTGCCACCCCATCCGTGCCCTCGACGTCCGCACGTCCAGCAA 674 

Qy 44 5 TGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCT 504 

Ml II I I II I I I I I I I I I I I II I I I I I I I 

Db 67 5 AGCCCAGGCTGTCAATGTGGCCATCTGGGCCCTGGCCTCTGTTGTCGGTGTTCCCGTTGC 734 

Qy 505 CACTTTCATCAATTCTGTCCCAAAAGAAGAGGGCAGTAACTGCATCGACTATGCAAGTTC 564 

II I I I I I I I I I I I III I I I II 



Db 735 CAT CAT GG GC T C GGC AC AG GT C GAGGAT GAAGAGAT C GAGT GCCTGGTG GAGAT C C CT AC 7 94 



Qy 565 TGGAAACCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAAT 624 

I II I II I I I I I I I III I I I I I I 

Db 7 95 CCCTCAGGATTACTGGGGCCCGGTGTTTGCCATCTGCATCTTCCTCTTCTCCTTCATCGT 8 54 

Qy 625 TCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAG 68 4 

III I I I I I I I I I I I I I I I I I I I I 

Db 855 CCCCGTGCTCGTCATCTCTGTCTGCTACAGCCTCATGATCCGGCGGCTCCGTGGAGTCCG 914 

Qy 685 CCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGT 744 

I I I I I I I I I I I I I I I I 11 I I I 

Db 915 CCTGCTCTCGGGCTCCCGAGAGAAGGACCGGAACCTGCGGCGCATCACTCGGCTGGTGCT 97 4 

Qy 7 45 T GT GAT CT T C T CT AT ACT C T T C AC AC C CT AT CAT AT CAT G C G CAAT TT GAGGAT C GC CT C 8 04 

I I I I I I I I I I I II II I I I I ! I 

Db 97 5 GGTGGTAGTGGCTGTGTTCGTGGGCTGCTGGACGCCTGTCCAGGTCTTCGTGCTGGCCCA 1034 

Qy 8 05 AC G C CT GGAT AGTT G G CC AC AAG GAT GT ACAC AGAAG GC C AT CAAAT CT AT AT AC AC AC T 8 64 

I I I I I I I III I II I I I I I I Mill 
Db 1035 AGGGCTGGGGGTTCAGCCGAGCAGCGAGACTGCCGTGGCCATTCTGCGCTTCTGCAC 10 91 

Qy 8 65 GACACGGCCTCTGGCCTTTCTGAACAGTGCCATCAATCCCATCTTCTACTTCCTCATGGG 924 

I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I 

Db 1092 GGCCCTGGGCTACGTCAACAGCTGCCTCAACCCCATCCTCTACGCCTTCCTGGA 1145 

Qy 92 5 AGACCATTACA 935 

II I I I I 

Db 114 6 TGAGAACTTCA 1156 



RESULT 9 
US-08-461-244-1 

Sequence 1, Application US/08461244 
Patent No. 5776729 
GENERAL INFORMATION: 

APPLICANT : Soppet, Daniel R. 
APPLICANT: Yi, Li 
APPLICANT: Ruben, Steven M. 
APPLICANT: Rosen, Craig A. 

TITLE OF INVENTION: HUMAN G- PROTEIN RECEPTOR HGBER32 
NUMBER OF SEQUENCES: 7 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: CARELLA, BYRNE, BAIN, GILFILLAN, CECCHI, 
ADDRESSEE: STUART & OLSTEIN 
STREET: 6 Becker Farm Road 
CITY: Roseland 
STATE: New Jersey 
COUNTRY: USA 
ZIP: 07068 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy "disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE : Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/461, 244 



; FILING DATE: 05-JUN-1995 

CLASSIFICATION: 536 
; ATTORNEY/ AGENT INFORMATION: 

; NAME: Ferraro, Gregory D. 

REGISTRATION NUMBER : 36,134 
; REFERENCE/ DOCKET NUMBER: 325800-445 

; TELECOMMUNICATION INFORMATION: 

TELEPHONE: 201-994-1700 

TELEFAX: 201-994-1744 : 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: '1586 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : single 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
FEATURE: 
; NAME/ KEY : CDS 

; LOCATION: 4 31.. 14 95 

US-08-461-244-1 

Query Match 5.2%; Score 80; DB 1; Length 1586; 

Best Local Similarity 47.3%; Pred. No. 1.4e-12; 

Matches 27 6; Conservative 0; Mismatches 305; Indels 3; Gaps 1; 

Qy 98 AAGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTC 157 

Mil Ml Ml I I II II II III III I I I II I I I I I 

Db 533 AAGTTGCTCCTTGCTGTCTTTTATTGCCTCCTGTTTGTATTCAGTCTTCTGGGAAACAGC 592 

Qy 158 ACTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTT 217 

II I III I I I II II I II I II I II II M I I II I I 

Db 593 CTGGTCATCCTGGTCCTTGTGGTCTGCAAGAAGCTGAGGAGCATCACAGATGTATACCTC 652 

Qy 218 TTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGT 27 7 

II I II M II I llllll I III I I I M I II I I I I I I 

Db 653 TTGAACCTGGCCCTGTCTGACCTGCTTTTTGTCTTCTCCTTCCCCTTTCAGACCTA C 709 

Qy 278 TAT G C CAAT GAT AAGGG GAC C TAT G GAGAT GT T CT CT GT AT AAGC AAC C GAT AT GT GC T T 337 

III llllll I II! Ill I IN I I I 

Db 710 TATCTGCTGGACCAGTGGGTGTTTGGGACTGTAATGTGCAAAGTGGTGTCTGGCTTTTAT 7 69 

Qy 338 CACAC CAAC CT CT AC AC CAGCAT CCTCTTCCT C AC T T T CAT T AGCAT G GAC C GAT AT CT G 397 

III I M II II II I M I I II I II I I llllll II II I I I I II I 

Db ■ 770 TACATTGGCTTCTACAGCAGCATGTTTTTCATCACCCTCATGAGTGTGGACAGGTACCTG 829 

Qy 398 C T CAT GAAGT AC C CT T T CC GAGAACACT T T CT ACAAAAGAAGGAAT T T GC CAT TT T AAT C 4 57 

I 111 II I III I I I I I 

Db 8 30 GCTGTTGTCCATGCCGTGTATGCCCTAAAGGTGAGGACGATCAGGATGGGCACAACGCTG 88 9 

Qy 458 TCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAAT 517 

I I II M I I II I II II I II II I I II II 

Db 890 TGCCTGGCAGTATGGCTAACCGCCATTATGGCTACCATCCCATTGCTAGTGTTTTACCAA 94 9 

Qy 518 TCTGTCCCAAAAGAAGAGGGCAGTAACTGCATCGACTATGCAAGTTCTGGAAACCCTGAA 577 

I I I II II I I III II II I I 

Db 950 GTGGCCTCTGAAGATGGTGTTCTACAGTGTTATTCATTTTACAATCAACAGACTTTGAAG 1009 



Qy 578 CACAATCTCATTTACAGCCTCTGCCTGACTTT'GTTGGGCTTCCTAATTCCTCTCTCTGTG 637 

II III I I I I I III I I I I I I I I I I I I I III I 

Db 1010 T GGAAGAT CT T C AC CAACT T CAAAAT GAAC AT T T T AGG C T T GT T GAT C CC AT T C AC CAT C 1069 

Qy 638 ATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAG 681 

I Ml I I I I I I I I I I I I I II I I I I 

Db 1070 TTTATGTTCTGCTACATTAAAATCCTGCACCAGCTGAAGAGGTG 1113 * 



RESULT 10 

US-09-016-434-1096 

; Sequence 1096, Application US/09016434 

; Patent No. 6500938 

; GENERAL INFORMATION: 

; APPLICANT: Janice Au-Young 

APPLICANT : Jeffrey J. Seilhamer 

TITLE OF INVENTION: COMPOSITION FOR THE DETECTION OF SIGNALING 
TITLE OF INVENTION: PATHWAY GENE EXPRESSION 
NUMBER OF SEQUENCES: 14 9 0 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: INCYTE PHARMACEUTICALS, INC. 

STREET: 3174 PORTER DRIVE 

CITY: PALO ALTO 

STATE: CALIFORNIA 

COUNTRY: USA 

ZIP: 94304 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
; COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 
; SOFTWARE: Word Perfect 6.1 for Windows/MS-DOS 6.2 

CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/016,434 

FILING DATE: HEREWITH 

CLASSIFICATION: 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 

FILING DATE: 

CLASSIFICATION: 
ATTORNEY/AGENT INFORMATION: 

NAME: Zeller, Karen J. 

REGISTRATION NUMBER: 37,071 

REFERENCE/DOCKET NUMBER: PA-0002 US 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (650) 855-0555 

TELEFAX : (650) 845-4166 
; INFORMATION FOR SEQ ID NO: 1096: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1953 base pairs 

TYPE: nucleic acid 

STRANDEDNESS : single 

TOPOLOGY: linear 
IMMEDIATE SOURCE: 

LIBRARY: GEN BANK 

CLONE: gl245056 
US-09-016-434-1096 



Query Match 5.2%; Score 80; DB 4; Length 1953; 

Best Local Similarity 47.3%; Pred. No. 1.5e-12; 

Matches 276; Conservative 0; Mismatches 305; Indels 3; Gaps 



1; 



Qy 98 AAGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTC 157 

I I I I III III I I I I I! II III III ! II II MM I 

Db 369 AAGTTGCTCCTTGCTGTCTTTTATTGCCTCCTGTTTGTATTCAGTCTTCTGGGAAACAGC 42 8 

Qy 158 ACTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTT 217 

II I III I I I I II I I I I ! I I I I I I I I I I I M I I 

Db 429 CTGGTCATCCTGGTCCTTGTGGTCTGCAAGAAGCTGAGGAGCATCACAGATGTATACCTC 48 8 

Qy 218 TTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGT 277 

I I I I I I I I I I I I I M I I III I I I II I III I I II I 

Db 4 89 TTGAACCTGGCCCTGTCTGACCTGCTTTTTGTCTTCTCCTTCCCCTTTCAGACCTA C 54 5 

Qy 27 8 TATGCCAAT GATAAGGGGAC CTAT GGAGAT GTTCTCT GTATAAGCAAC CGATATGT GCTT 337 

III M I I I I I I I I I I I I I I I I I I 

Db 546 TATCTGCTGGACCAGTGGGTGTTTGGGACTGTAATGTGCAAAGTGGTGTCTGGCTTTTAT 605 

Qy 338 CACAC C AAC CT CT ACAC C AGCAT C CT CT T C CT C ACT TT CAT T AGC AT G GAC C GAT AT CT G 397 

III I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 606 T ACAT T GGCT T CT ACAGC AG CAT GT T T T T CAT C AC C C T CAT GAGT GT G GAC AGGT AC CT G 665 

Qy 398 CT CAT GAAGT ACC C T T T C C GAGAACACT T T CT AC AAAAGAAG GAAT T T GC C AT T T TAAT C 457 

I III II I III III I 

Db 666 GCTGTTGTCCATGCCGTGTATGCCCTAAAGGTGAGGACGATCAGGATGGGCACAACGCTG 725 

Qy 4 58 TCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAAT 517 

I I I II I II I I I I I II III I I I I II II 

Db 72 6 TGCCTGGCAGTATGGCTAACCGCCATTATGGCTACCATCCCATTGCTAGTGTTTTACCAA 7 85 

Qy 518 TCTGTCCCAAAAGAAGAGGGCAGTAACTGCATCGACTATGCAAGTTCTGGAAACCCTGAA 57 7 

I I I I I I I I I III II II I i 

Db 7 8 6 GTGGCCTCT GAAGAT G GT GT TC T ACAGT GT T ATT CAT T T T ACAAT CAACAGACT T T GAAG 84 5 

Qy 578 CACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTG 637 

I I III I I I I I III I I I I I I I I I I I I I III I 

Db 84 6 T GGAAGAT CT T CAC CAACT T CAAAAT GAAC AT T T TAG G CT T GT T GAT C CC AT T C AC CAT C 905 

Qy 638 ATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAG 681 

I I I II I I I I I I I I I I I I I I 1 I I I 

Db 906 T T TAT GT T C T GCT ACAT T AAAAT C CT G CAC CAGCT GAAGAG GT G 94 9 



RESULT 11 
US-08-513-974B-57 

Sequence 57, Application US/08513974B 
Patent No. 6114139 
GENERAL INFORMATION: 



Hinuma, Shuji 
Hosoya, Masaki 
Fujii, Ryo 
Ohtaki, Tetsuya 
Fukusumi, Shoji 
Ohgi, Kazuhiro 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 

TITLE OF INVENTION: G PROTEIN COUPLED RECEPTOR PROTEIN, 



TITLE OF INVENTION: PRODUCTION, AND USE THEREOF 
NUMBER OF SEQUENCES : 380 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: DIKE, BRONSTEIN, ROBERTS & CUSHMAN, LLP 

STREET: 130 Water Street 

CITY: Boston 

STATE: MA 

COUNTRY: USA 

ZIP: 02109 
COMPUTER READABLE FORM: 

MEDIUM- TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/513, 974B 

FILING DATE: 14-SEP-1995 

CLASSIFICATION: 536 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/ JP95/ 0159 9 

FILING DATE: 10-AUG-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 7-093989 

FILING DATE: 19-AUG-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 7-057186 

FILING DATE: 16-MAR-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 7-007177 

FILING DATE: 20-JAN-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-326611 

FILING DATE: 28-DEC-1994 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-270017 

FILING DATE: 02-NOV-1994 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-236357 

FILING DATE: 30-SEP-1994 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-236356 

FILING DATE: 30-SEP-1994 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-189274 

FILING DATE: ll-AUG-1994 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-189273 

FILING DATE: ll-AUG-194 5 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-189272 

FILING DATE: ll-AUG-1994 
ATTORNEY/AGENT INFORMATION: 

NAME: Resnick, David S. 

REGISTRATION NUMBER: 34,235 

REFERENCE/ DOCKET NUMBER: 45753 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: 617-523-34 00 



TELEFAX : 617-523-6440 
INFORMATION FOR SEQ ID NO: 57: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 984 base pairs 
TYPE: nucleic acid 
STRANDEDNESS: double 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
US-08-513-974B-57 

Query Match 5.2%; Score 79.6; DB 3; Length 984; 

Best Local Similarity 46.0%; Pred. No. 1.4e-12; 

Matches 38 8; Conservative 0; Mismatches 444; Indels 12; Gaps 3; 

Qy 89 ATCTTGAATAAGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTT 14 8 

I I I I I I I II I I I III I ill I I I I I I I 

Db 67 AACTTCAAGCAACTGCTGCTGCCACCTGTGTATTCGGCGGTGCTGGCGGCTGGCCTGCCG 126 

Qy 149 GGGAATGTCACTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCT^T 208 

III II I I I I I I I i I .1 I I I I I I I 

Db 127 CTGAACATCTGTGTCATTACCCAGATCTGCACGTCCCGCCGGGCCCTGACCCGCACGGCC 18 6 

Qy 2 09 GTCTATCTTTTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTG 268 

I I I I I I I I I I I I I 1 I I I I I I I I I I I II I I I I I i 

Db 187 GTGTACACCCTAAACCTTGCTCTGGCTGACCTGCTATATGCCTGCTCCCTGCCCCTGCTC 24 6 

Qy 269 AT AAAGAGT TAT G C C AA T GAT AAGG GGAC C TAT GGAGAT GTT C T CT GT AT AAG CAAC 325 

II II I I I I I I I I I I I I I I I I I I I I I I I 111 I 

Db 247 ATCTACAACTATGCCCAAGGTGATCACTGGCCCTTTGGCGACTTCGCCTGCCGCCTGGTC 30 6 

Qy 326 CGATATGTGCTTCACACCAACCTCTACACCAGCATCCTCTTCCTCACTTTCATTAGCATG 385 

III I I I I I I I I I I II II I I II I I II I I I I I I I I I I I I I II I 

Db 307 CGCTTCCTCTTCTATGCCAACCTGCACGGCAGCATCCTCTTCCTCACCTGCATCAGCTTC 366 

Qy 386 GAC C GATAT CT GCT CATGAAGT AC C CTT T C CGAGAACACTTT CTACAAAAGAAGGAA 4 42 

. I I I I I I M III MM I II Ml I 

Db 367 CAGCGCTACCTGGGCATCTGCCACCCGCTGGCCCCCTGGCACAAACGTGGGGGCCGCCGG 42 6 

Qy 4 43 TTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATG 502 

II I I I I I I I I II I II Mill I M I I II 

Db 427 GCTGCCTGGCTAGTGTGTGTAACCGTGTGGCTGGCCGTGACAACCCAGTGCCTGCCCACA 4 86 

Qy 503 CTCACTTTCATCAATTCTGTCCCAAAAGAAGAGGGCAGTAACTGCATCGACTATGCAAGT 562 

II ill III I I I I I I I I II Ml 

Db 4 87 GCCATCTTCGCTGCCACAGGCATCCAGCGTAACCGCACTGTCTGCTATGACCTCAGCCCG 54 6 

Qy 563 TCTGGAAACCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTA 622 

III I II I II I II III II II I I I I II I I I M 

Db 547 CCTGCCCTGGCCACCCACTATATGCCCTATGGCATGGCTCTCACTGTCATCGGCTTCCTG 606 

Qy 623 ATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGG 682 

I II I I I I II I I I I I IE Ml III I I 

Db 607 CTGCCCTTTGCTGCCCTGCTGGCCTGCTACTGTCTCCTGGCCTGCCGCCTGTGCCGCCAG 666 

Qy 683 AGCCAGCAGCAAGCAACTG CCCTGCCACTGGACAAACCCCAACGCCTGGTGGTC 736 

II II Ml II I I I M I I II I II II 

Db 667 GATGGCCCGGCAGAGCCTGTGGCCCAGGAGCGGCGTGGCAAGGCGGCCCGCATGGCCGTG 72 6 



Qy 7 37 CTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATATCATGCGCAATTTGAGG 796 

I I I I I I I I I I III I I I I I I I I I I I I I I I 

Db 727 GTGGTGGCTGCTGCCTTTGCCATCAGCTTCCTGCCTTTTCACATCACCAAGACAGCCTAC 7 86 

Qy 797 AT C GCCTCAC GC CT GGAT AGT T G G C CACAAGGAT GT AC AC AGAAG G C CAT CAAAT CT AT A 856 

I I I III i I I I I I I I I I I I M 

Db 7 87 CTGGCAGTGGGCTCGACGCCGGGCGTCCCCTGCACTGTATTGGAGGCCTTTGCAGCGGCC 84 6 

Qy 8 57 T ACACACT GAC AC GGCCTCTGGCCTTTCT GAAC AGT G C CAT C AAT C C CAT CTT CT ACT T C 916 

Mill I I I I I I I I I I I I Mill! I I I I I I I I I I I I I I 

Db 8 47 TACAAAGGCACGCGGCCGTTTGCCAGTGCCAACAGCGTGCTGGACCCCATCCTCTTCTAC 906 

Qy 917 CTCA 920 

III 

Db 907 TTCA 910 



RESULT 12 
US-09-461-436B-57 

; Sequence 57, Application US/09461436B 
; Patent No. 6538107 

GENERAL INFORMATION: 

APPLICANT: Shuji Hinuma 
; Yasuaki Ito 

; Ryo Fujii 

TITLE OF INVENTION: G Protein Coupled Receptor Protein, 
; 1 Production, And Use Thereof 

; NUMBER OF SEQUENCES: 61 

; CORRESPONDENCE ADDRESS: 

ADDRESSEE: Edwards & Angell, LLP 

STREET: 101 Federal Street 
; CITY: BOSTON 

STATE: MA 

COUNTRY: USA 

ZIP: 02209 
COMPUTER READABLE FORM: 
; ' MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.25 (EPO) 
; CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 09/ 4 61 , 4 3 6B 

FILING DATE: 14-Dec-1999 

CLASSIFICATION: <Unknown> 
PRIOR APPLICATION' DATA: 

APPLICATION NUMBER: 08/513,974 

FILING DATE: 14-SEP-1995 

APPLICATION NUMBER: PCT/ JP95/ 015 9 9 

FILING DATE: 10-AUG-1995 

APPLICATION NUMBER: 7-093989 
; FILING DATE: 19-APR-1995 

APPLICATION NUMBER: 7-057186 

FILING DATE: 16-MAR-1995 

APPLICATION NUMBER: 7-007177 

FILING DATE: 20-JAN-1995 

APPLICATION NUMBER: 6-326611 



FILING DATE: 28-DEC-1994 
APPLICATION NUMBER: 6-270017 
FILING DATE: 02-NOV-1994 
APPLICATION NUMBER: 6-236357 
FILING DATE: 30-SEP-1994 
. APPLICATION NUMBER: 6-236356 
FILING DATE: 30-SEP-1994 
APPLICATION NUMBER: 6-189274 
FILING DATE: ll-AUG-1994 
APPLICATION NUMBER: 6-189273 
FILING DATE: ll-AUG-1994 
APPLICATION NUMBER: 6-189272 
FILING DATE: ll-AUG-1994 

ATTORNEY/AGENT INFORMATION: 
NAME: CONLIN, DAVID G. 
REGISTRATION NUMBER: <Unknown> 
REFERENCE/ DOCKET NUMBER: 45753 DIV2 

TELECOMMUNICATION INFORMATION: 
TELEPHONE: 617-439-4444 
TELEFAX: 617-439-4170 
INFORMATION FOR SEQ ID NO: 57: 

SEQUENCE CHARACTERISTICS: 

LENGTH: 984 base pairs 
TYPE: nucleic acid 
STRANDEDNESS: double 
TOPOLOGY: linear 

MOLECULE TYPE: cDNA 

SEQUENCE DESCRIPTION : SEQ ID NO: 57: 
US-09-461-436B-57 



Query Match 5.2%; Score 79.6; DB 4; Length 984; 

Best Local Similarity 46.0%; Pred. No. 1.4e-12; 

Matches 38 8; Conservative 0; Mismatches 444; Indels 12; Gaps 3; 

Qy 89 ATCTTGAATAAGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTT 14 8 

I I I I I I 1 I I ! E I I ! I I III MINI: 

Db 67 AACTTCAAGCAACTGCTGCTGCCACCTGTGTATTCGGCGGTGCTGGCGGCTGGCCTGCCG 12 6 



Qy 149 GGGAATGTCACTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAAT 2 08 

I I I I I I I I I I I I J I I I I I I I I I 

Db 127 CTGAACATCTGTGTCATTACCCAGATCTGCACGTCCCGCCGGGCCCTGACCCGCACGGCC 18 6 



Qy 209 GTCTATCTTTTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTG 268 

II M I I I I I I I I I II I I I I I M I I I I I M I I I I 

Db 187 GTGTACACCCTAAACCTTGCTCTGGCTGACCTGCTATATGCCTGCTCCCTGCCCCTGCTC 24 6 



Qy 2 69 ATAAAGAGTTATGCCAA T GAT AAGGGGAC C TAT G GAGAT GT T CT C TGT AT AAGC AAC 32 5 

II II 1 I I I I I I I I I I I I I I I I I I I II I III I 

Db 247 ATCTACAACTATGCCCAAGGTGATCACTGGCCCTTTGGCGACTTCGCCTGCCGCCTGGTC 30 6 

Qy 326 CGATATGTGCTTCACACCAACCTCTACACCAGCATCCTCTTCCTCACTTTCATTAGCATG 38 5 

II I I I I Ml,!!, II I I II I M I I I I I M I I I I I I I I I I I I 

Db 307 CGCTTCCTCTTCTATGCCAACCTGCACGGCAGCATCCTCTTCCTCACCTGCATCAGCTTC 366 



Qy 386 GACCGATATCTGCTCATGAAGTACCCTTT C C GAGAACAC TT T CT ACAAAAGAAG GAA 442 

I II II III III I I I I I II III I 

Db 367 CAGCGCTACCTGGGCATCTGCCACCCGCTGGCCCCCTGGCACAAACGTGGGGGCCGCCGG 42 6 



Qy 4 43 TTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATG 502 

ill | I I I E I I I I I I I I I I I I I I I I I I 

Db 427 GCTGCCTGGCTAGTGTGTGTAACCGTGTGGCTGGCCGTGACAACCCAGTGCCTGCCCACA 48 6 

Qy 503 CT C ACT T T CAT CAAT T G T GT C C C AAAAGAAGAGGG CAGT AAC T GC AT CGACT AT G C AAGT 562 

I I I I I III I I I I I I I I I I M I 

Db 4 87 GCCATCTTCGCTGCCACAGGCATCCAGCGTAACCGCACTGTCTGCTATGACCTCAGCCCG 54 6 

Qy 563 TCTGGAAACCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTA 622 

Ml I I II I I t IE III I I I I I I I I I I I I I I I 

Db 547 CCTGCCCTGGCCACCCACTATATGCCCTATGGCATGGCTCTCACTGTCATCGGCTTCCTG 606 

Qy 623 AT TCCTCTCTCT GT GAT GTGCTTCTTC T AC T ACAAGAT G GT AGT CT T CT T AAAGAGGAG G 682 

III I 111 II I I I I I I I III III I I 

Db 607 CTGCCCTTTGCTGCCCTGCTGGCCTGCTACTGTCTCCTGGCCTGCCGCCTGTGCCGCCAG 666 

Qy 68 3 AGCCAGCAGCAAGCAACTG CCCTGCCACTGGACAAACCCCAACGCCTGGTGGTC 736 

I I I I Ml I I I I I I I I I I I I I I I I 

Db 667 GATGGCCCGGCAGAGCCTGTGGCCCAGGAGCGGCGTGGCAAGGCGGCCCGCATGGCCGTG 726 

Qy 737 CTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATATCATGCGCAATTTGAGG 7 96 

I I I I I I I III I 1 I I I I I I I I I I I I I I I I 

Db 727 GTGGTGGCTGCTGCCTTTGCCATCAGCTTCCTGCCTTTTCACATCACCAAGACAGCCTAC 78 6 

Qy 797 AT C G CCT CAC G C CT G GAT AGTT G G CC ACAAGGAT GT ACAC AGAAGG C CAT CAAAT CT AT A 856 

IN M l I I I I I I Mill I II 

Db 7 87 CTGGCAGTGGGCTCGACGCCGGGCGTCCCCTGCACTGTATTGGAGGCCTTTGCAGCGGCC 84 6 

Qy 857 TACACACTGACACGGCCTCTGGCCTTTCTG7\ACAGTGCCATCAATCCCATCTTCTACTTC 916 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 84 7 TACAAAGGCACGCGGCCGTTTGCCAGTGCCAACAGCGTGCTGGACCCCATCCTCTTCTAC 906 

Qy 917 CTCA 920 

I I I 

Db 907 TTCA 910 



RESULT 13 

US-08-513-974B-379 

Sequence 379, Application US/08513974B 
Patent No. 6114139 
GENERAL INFORMATION: 

APPLICANT: Hinuma, Shuji 
APPLICANT: Hosoya, Masaki 
APPLICANT: Fujii, Ryo 
APPLICANT: Ohtaki, Tetsuya 
APPLICANT: Fukusumi, Shoji 
APPLICANT: Ohgi, Kazuhiro 

TITLE OF INVENTION: G PROTEIN COUPLED RECEPTOR PROTEIN, 
TITLE OF INVENTION: PRODUCTION, AND USE THEREOF 
NUMBER OF SEQUENCES: 380 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: DIKE, BRONSTEIN, ROBERTS & CUSHMAN, LLP 
STREET: 130 Water Street 
CITY: Boston 
STATE: MA 



COUNTRY: USA 
ZIP : 02109 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE : Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08 / 513 , 974B 

FILING DATE: 14-SEP-1995 

CLASSIFICATION: 536 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/JP95/ 01599 

FILING DATE: 10-AUG-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 7-093989 

FILING DATE: 19-AUG-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 7-057186 

FILING DATE: 16-MAR-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 7-007177 

FILING DATE: 20-JAN-1995 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-326611 

FILING DATE: 28-DEC-1994 
PRIOR, APPLICATION DATA: 

APPLICATION NUMBER: JP 6-270017 

FILING DATE: 02-NOV-1994 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-236357 

FILING DATE: 30-SEP-1994 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-236356 

FILING DATE: 30-SEP-1994 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-189274 

FILING DATE: ll-AUG-1994 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-189273 

FILING DATE: ll-AUG-1945 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: JP 6-189272 

FILING DATE: ll-AUG-1994 
ATTORNEY/AGENT INFORMATION: 

NAME: Resnick, David S. 

REGISTRATION NUMBER: 34,235 

REFERENCE/ DOCKET NUMBER: 4 5753 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: 617-52 3-3400 

TELEFAX: 617-523-64 4 0 
INFORMATION FOR SEQ ID NO: 37 9: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1023 base pairs 

TYPE: nucleic acid 

STRANDEDNESS : double 

TOPOLOGY: linear 



MOLECULE TYPE: cDNA 

FEATURE : 
; NAME/ KEY: CDS 

LOCATION: 37. . 1020 
US-08-513-974B-379 

Query Match 5.2%; Score 79.6; DB 3; Length 1023; 

Best Local Similarity 46.0%; Pred. No. 1.4e-12; 

Matches 388; Conservative 0; Mismatches 444; Indels 12; Gaps 3; 

Qy 8 9 AT C T T GAAT AAGT AC T AC C T CT CT GC AT T TT AT GC AAT C GAGT T C ATT T T T GGAC T G CT T 14 8 

I I I I I I I I I I I I I I I I III I M I I I I 

Db 103 AACTTCAAGCAACTGCTGCTGCCACCTGTGTATTCGGCGGTGCTGGCGGCTGGCCTGCCG 162 

Qy 14 9 GGGAATGTCACTGTGGTGTTCGGCTACCTCTTCTGCATGAAGJ\ACTGG7^ACAGCAGCAAT 2 08 

II I I I II I I I II II I I MINI 

Db 163 CTGAACATCTGTGTCATTACCCAGATCTGCACGTCCCGCCGGGCCCTGACCCGCACGGCC 222 

Qy 209 GTCTATCTTTTTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTG 268 

I I I I I I I I I I I I I II I II I I I I I I I I I I I I 1 I I 

Db 223 GTGTACACCCTAAACCTTGCTCTGGCTGACCTGCTATATGCCTGCTCCCTGCCCCTGCTC 282 

Qy 2 69 AT AAAGAGTTAT GC CAA T GAT AAGGG GAC CT AT G GAGAT GT T CT CT GT AT AAG CAAC 32 5 

II II I I I I I I I Mill I I I I I I i I I I till I 

Db 283 ATCTACAACTATGCCCAAGGTGATCACTGGCCCTTTGGCGACTTCGCCTGCCGCCTGGTC 342 

Qy 32 6 C GAT AT GT GC T T C ACACC AAC CT CT AC AC C AG CAT CCTCTTCCT C ACT T T CAT T AGC AT G 385 

III I I I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I I 

Db 34 3 CGCTTCCTCTTCTATGCCAACCTGCACGGCAGCATCCTCTTCCTCACCTGCATCAGCTTC 4 02 

Qy 386 GAC C GAT AT C T GCT C AT GAAGT AC C C T T T C C GAGAAC AC T T T C T AC AAAAGAAG GAA 442 

I I I I I II I III I I I I I II III I 

Db 4 03 CAGCGCTACCTGGGCATCTGCCACCCGCTGGCCCCCTGGCACAAACGTGGGGGCCGCCGG 4 62 

Qy 44 3 TTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATG 502 

I I I I I I I I I I I I 'I I I I I I I I I ' I I I I II 

Db 4 63 GCTGCCTGGCTAGTGTGTGTAACCGTGTGGCTGGCCGTGACAACCCAGTGCCTGCCCACA 522 

Qy 503 C T C ACT T T CAT CAAT T CT GT C C CAAAAGAAGAGG G CAGT AACT GC AT C GACT AT GCAAGT 562 

Mill 111 I I III I Mil III 

Db 523 GCCATCTTCGCTGCCACAGGCATCCAGCGTAACCGCACTGTCTGCTATGACCTCAGCCCG 582 

Qy 563 TCTGGAAACCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTA 622 

III I I I I I I I II III I I M I I I I I II M M 

Db 58 3 CCTGCCCTGGCCACCCACTATATGCCCTATGGCATGGCTCTCACTGTCATCGGCTTCCTG 642 

Qy 62 3 ATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGG 682 

I M I I I I II I I I I I I I III III I I 

Db 64 3 CTGCCCTTTGCTGCCCTGCTGGCCTGCTACTGTCTCCTGGCCTGCCGCCTGTGCCGCCAG 7 02 

Qy 683 AGC CAGCAGCAAGC AACT G CCCTGCCACTGGACAAACCCCAACGCCTGGTGGTC 736 

I I I I III II I I I I I I I I I I II I I 

Db 7 03 GATGGCCCGGCAGAGCCTGTGGCCCAGGAGCGGCGTGGCAAGGCGGCCCGCATGGCCGTG 7 62 

Qy 7 37 CTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATATCATGCGCAATTTGAGG 7 96 

I I I I I M Ml III I I I I I I I I M I I II I 

Db 763 GTGGTGGCTGCTGCCTTTGCCATCAGCTTCCTGCCTTTTCACATCACCAAGACAGCCTAC 822 



Qy 7 97 AT CG C CT CAC G C C T GGAT AGT T G GC C AC AAG GAT GT AC AC AGAAGGC C AT CAAAT CT AT A 85 6 

Ml Ml I I I I I I I I I II I II 

Db 823 CTGGCAGTGGGCTCGACGCCGGGCGTCCCCTGCACTGTATTGGAGGCCTTTGCAGCGGCC 8 82 

Qy 857 T AC AC ACT GACAC GGC C TCTGGCCTTTCT GAACAGT GC CAT CAAT C C CAT CT T CT AC T T C 916 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I i I 

Db 883 TACAAAGGCACGCGGCCGTTTGCCAGTGCCAACAGCGTGCTGGACCCCATCCTCTTCTAC 94 2 

Qy 917 CTCA 920 

I I I 

Db 943 TTCA 946 



RESULT 14 
US-08-432-174A-3 

Sequence 3, Application US/08432174A 
Patent No. 6562587 
GENERAL INFORMATION: 
APPLICANT : KIEFFER, 
TITLE OF INVENTION: 
TITLE OF INVENTION: 
FILE REFERENCE: EX92009-US 

CURRENT APPLICATION NUMBER: US/ 08/ 432 , 174A 
CURRENT FILING DATE: 1995-05-10 
NUMBER OF SEQ ID NOS : 4 
SOFTWARE: Patentln Ver. 
SEQ ID NO 3 



BRIGITTE 

NOVEL POLYPEPTIDES HAVING OPIOID RECEPTOR ACTIVITY, 
NUCLEIC ACIDS CODING THEREFOR AND USES THEREOF 



2.1 



LENGTH: 998 
TYPE: DNA 
ORGANISM: 
FEATURE : 
NAME/ KEY: 
LOCATION: 
NAME/ KEY: 
LOCATION: 



Homo sapiens 



CDS 

(1) - - (996) 
modif ied_base 
(922) 

OTHER INFORMATION: a, t, 
NAME/ KEY : modif ieci_base 
LOCATION: (927) 
OTHER INFORMATION: a, t, 
NAME/ KEY: modif ied__base 
LOCATION: (931) . . (932) 
OTHER INFORMATION: a, t, 
US-08-432-174A-3 



other or unknown 



c, g, other or unknown 



c, g, other or unknown 



Query Match 5.1%; Score 79.2; DB 4; Length 998; 

Best Local Similarity 50.0%; Pred. No. 1.8e-12; 

Matches 198; Conservative 0; Mismatches 198; Indels 0; 



Gaps 



0; 



Qy 



Db 



106 CCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGT 165 

II I M III I II I I I I I I I I I I I I I I I I i Ml I 

33 CATCACCGCGCTCTACTCGGCCGTGTGCGCCGTGGGGCTGCTGGGCAACGTGCTTGTCAT 92 



Qy 



Db 



166 GTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCT 225 

I I I I I I I I I I III I I I I II I I I I I I I I I I I M IMIt 

93 GT T CGG CAT C GT C C G GT AC ACT AAGAT GAAGAC GAC CAC CAAG AT CT ACAT CT T C AAC C T 152 



Qy 22 6 TTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAA 2 85 

I I I I I I I MM II II II! !l IN I 

Db 153 GGCCTTAGCCGATGCGCTGGCCACCAGCACGCTGCCTTTCCAGAGTGCCAAGTACCTGAT 212 

Qy 28 6 T GAT AAGGGGAC CT AT G GAG AT GT T C T C T GT AT AAGCAAC C GAT AT GT GCT T C AC AC C AA 34 5 

II I I II III III! I I I I II I I I I M I I I 

Db 213 GGAGACGTGGCCCTTCGGCGAGCTGCTCTGCAAGGCTGTGCTCTCCATCGACTACTACAA 2 72 

Qy 34 6 C CT CT AC AC CAG C AT CC T CT T C C T CAC T T T C ATT AGCAT GGAC C GAT AT CT G CT CAT GAA 4 05 

I I I I I I I I I M I I I It I I I Mill I I I I I I I I I I 
Db 27 3 TAT GT T CAC CAG CAT CT T CAC GCT CAC CAT GAT GAGT GT T GAC C GC T AC AT C GCT GT CT G 332 

Qy 406 GT AC C CT T T C C GAGAACACT T T C T AC AAAAGAAG GAAT T T GC CATT T T AAT CTC GCT G GC 465 

I I I I I I I II I I I 1 I MM I I I I I 

Db 333 CCACCCTGTCAAGGCCCTGGACTTCCGCACGCCTGCCAAGGCCAAGCTGATCAACATCTG 392 

Qy 4 66 TGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCAT 501 

I II I I I I i I I I I I I Mill 

Db 393 TATCTGGGTCCTGGCCTCAGGCGTTGGCGTGCCCAT 428 



RESULT 15 

US-09-016-434-1190 

Sequence 1190, Application US/09016434 
Patent No. 6500938 
GENERAL INFORMATION: 

APPLICANT: Janice Au-Young 
APPLICANT: Jeffrey J. Seilhamer 

TITLE OF INVENTION: COMPOSITION FOR THE DETECTION OF SIGNALING 
TITLE OF INVENTION: PATHWAY GENE EXPRESSION 
NUMBER OF SEQUENCES: 14 90 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: INCYTE PHARMACEUTICALS, INC. 
STREET: 3174 PORTER DRIVE 
CITY: PALO ALTO 
STATE: CALIFORNIA 
COUNTRY: USA 
ZIP : 94304 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Word Perfect 6.1 for Windows /MS-DOS 6.2 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/016, 434 
FILING DATE: HEREWITH 
CLASSIFICATION: 
PRIOR APPLICATION DATA: 
APPLICATION NUMBER: 
FILING DATE: 
CLASSIFICATION: 
ATTORNEY/AGENT INFORMATION: 
NAME: Zeller, Karen J. 
REGISTRATION NUMBER: 37,071 
REFERENCE/ DOCKET NUMBER: PA-0002 US 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (650) 855-0555 



TELEFAX: (650) 845-4166 
; INFORMATION FOR SEQ ID NO: 1190: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 1495 base pairs 

TYPE: nucleic acid 

STRANDEDNESS: single 

TOPOLOGY: linear 
IMMEDIATE SOURCE: 

LIBRARY: GENBANK 

CLONE: gl79984 
US-09-016-434-1.190 



Query Match 5.1%; Score 78.4; DB 4; Length 1495; 

Best Local Similarity 54.1%; Pred. No. 3.8e-12; 

Matches 160; Conservative 0; Mismatches 136; Indels 0; Gaps 0; 

Qy 107 CTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTG 166 

II I I III I II II I I I I I I I I I I I I I I II Mill 
Db 109 CTGCCCCCTCTGTACTCCTTGGTATTTGTCATTGGCCTGGTTGGAAACATCCTGGTGGTC 168 

Qy 167 TTCGGCTACCTCTTCTGCATGAAG J AACTGGAACAGCAGCAATGTCTATCTTTTT7y\CCTT 22 6 

III I Mill! I I I I I I I MM II I I I I I I 

Db 169 CTGGTCCTTGTGCAATACAAGAGGCTAAAAAACATGACCAGCATCTACCTCCTGAACCTG 22 8 



Qy 227 TCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT 2 86 

I M I I I I I II I I I I I I I I II I I II I I I I I 111 I I II 

Db 22 9 GCCATTTCTGACCTGCTCTTCCTGTTCACGCTTCCCTTCTGGATCGACTACAAGTTGAAG 28 8 



Qy 2 87 GAT AAGGG GACC T AT GGAGATGT T CT CT GT AT AAG C AAC C GAT AT GT GC T T C AC AC CAAC 34 6 

I I I I M I I II I I II I II I I III I I MM I 

Db 289 GATGACTGGGTTTTTGGTGATGCCATGTGTAAGATCCTCTCTGGGTTTTATTACACAGGC 34 8 

Qy 347 CT CT AC AC C AGC AT C CT C T T C CT C ACT T T CAT T AGCAT G GAC C GAT AT CT G CT CAT 402 

I I I I I I III I III III I II II III I il III III 
Db 34 9 TT GT AC AGC GAGAT CTT T T T CAT CAT C C T GC T GAC GAT T GACAGGT AC C T G GC CAT 4 04 



Search completed: December 14, 2003', 15:02:12 
Job time : 107 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2003 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



December 14, 2003, 15:00:25 ; Search time 516 Seconds 

(without alignments) 
9938.592 Million cell updates/s 

US-09-891-138A-1 
1543 

1 gctcctggcagagttttctg tgcctaaataaatcaatata 1543 

IDENTITY__NUC 

Gapop 10.0 , Gapext 1.0 



Searched: 2201672 seqs, 1661799599 residues 

Total number of hits satisfying chosen parameters: 4403344 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



Post-processing : 



Minimum Match 0% 
Maximum Match 100% 
Listing first 45 summaries 



Database 



Published Applications NA: * 

/cgn2_6/ptodata/2/pubpna/US07_PUBCOMB. seq:* 
/cgn2_6/ptodata/2/pubpna/PCT_NEW_PUB. seq: * 
/cgn2_6/ptodata/2/pubpna/US06_NEW PUB. seq:* 
/cgn2_6/ptodata/2/pubpna/US06_PUBCOMB. seq:* 
/cgn2_6/ptodata/2/pubpna/US07_NEW_PUB.seq:* 
/cgn2 6/ptodata/2/pubpna/PCTUS PUBCOMB. seq: * 
/cgn2_6/ptodata/2/pubpna/US08_NEW_PUB. seq:* 
/cgn2_6/ptodata/2/pubpna/US08_PUBCOMB. seq: * 
/cgn2_6/ptodata/2/pubpna/US09A_PUBCOMB. seq: * 
/cgn2_6/ptodata/2/pubpna/US09B_PUBCOMB. seq: ' 
/cgn2_6/ptodata/2/pubpna/US09C_PUBCOMB. seq: " 
/cgn2_6/ptodata/2/pubpna/US09_NEW_PUB. seq: * 
/cgn2_6/ptodata/2/pubpna/US09_NEW_PUB. seq2 : 
/cgn2_6/ptodata/2/pubpna/US10A_PUBCOMB. seq: 
/cgn2__6/ptodata/2/pubpna/US10B_PUBCOMB. seq: " 
/cgn2_6/ptodata/2/pubpna/US10_NEW_PUB. seq: * 
/cgn2_6/ptodata/2/pubpna/US60_NEW_PUB. seq: * 
/cgn2_6/ptodata/2/pubpna/US60_PUBC0MB. seq: * 
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Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed 
and is derived by analysis of the total score distribution. 
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RESULT 1 

US-09-891-138A-1 

; Sequence 1, Application US/09891138A 

; Publication No. US2 00300 83245A1 

; GENERAL INFORMATION: 

; APPLICANT: Lin, Daniel Chi-Hong 



APPLICANT: Zhao, Jiagang 
APPLICANT: Chen, Jin-Long 
APPLICANT: Cutler, Gene 
APPLICANT: Tularik Inc. 

TITLE OF INVENTION: No. US20030083245Alel Receptors 
FILE REFERENCE: 0 1 87 8 1-0062 1 OUS 
CURRENT APPLICATION NUMBER: US/ 09/8 91, 138A 
CURRENT FILING DATE: 2001-06-25 
PRIOR APPLICATION NUMBER: US 60/213,461 
PRIOR FILING DATE: 2000-06-23 
NUMBER OF SEQ ID NOS : 2 6 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 1 
LENGTH: 154 3 
TYPE: DNA 

ORGANISM: Mus musculus 
FEATURE: 
NAME/KEY: CDS 
LOCATION: (44) . . (997) 

OTHER INFORMATION: mouse TGR18 G-protein coupled receptor (GPCR) 
US-09-891-138A-1 

Query Match 100.0%; Score 1543; DB 11; Length 1543; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1543; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 GCTCCTGG CAGAGT T T T CT GT CGAGAC AGAAGCC G ACAG C AGAAT G GC AC AGAATT T AT C 60 

I I i I I I I I I I I I I I I I I I I I I I I I I I M i I II I I I I I I M i I I I I I I f I I I I M I I I I I I 

Db 1 G C T C CT GGCAGAGT T T T CT GT CGAGAC AGAAGC C GACAGC AGAAT GG C AC AGAATT TAT C 60 

Qy 61 T T GT GAGAAT T G GT T G G CAAC AGAG GCT AT CT TGAATAAGT ACT AC CT CT CT GC AT T T T A 12 0 

II I I I I I I I I I I I I I I I t I I M II I I II I I I I I I I I I I I M I I I I I II I I I I II II I I I I 

Db 61 TT GTGAGAATTGGTTGGCAACAGAGGCT AT CT TGAATAAGT ACT ACCTCTCTGCATTTT A 12 0 

Qy 121 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 180 

M M I I I II I I I I I I I I I II I I I I M I I I I I I I I I I I I I 1 M I I I I I ! I I I I I II I I I I I 

Db 121 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 180 

Qy 181 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II M I I I I I I I I I I I I M I I I I I 
Db 181 C T G CAT GAAGAACT G GAACAGCAGC AAT GT CT AT C T T T T T AAC CT T T C CAT CT CT GAC T T 24 0 

Qy 241 TGCTTTCCT GT G C AC C C T T C C CAT C CT GAT AAAGAGT TAT G C CAAT GAT AAG GGGAC CT A 300 

I I I I I I II I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I I I I I I I I I I I I I II I M I I 
Db 241 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 300 

Qy 301 T GGAGAT GT T CT CT GT AT AAG CAAC C GAT AT GT GCT T CAC AC CAAC CT CT AC AC CAGCAT 360 

I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I i I I I II I I I I I I I I 
Db 301 T GGAGAT GT T CT C T GTNTIKAG CAAC C GAT AT GT GCT T CAC AC CAAC C T CT AC AC CAG CAT 360 

Qy 3 61 CCTCTTCCT CACT T T CAT TAG CAT GGAC C GAT AT CT GCT CAT GAAGT AC C CT T T CC GAGA 42 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I II II I I I I I I I I I I 
Db 361 CCTCTTCCT C ACT T T CAT TAG CAT GGAC C GAT AT CT GCT CAT GAAGT AC C C T T T CC GAGA 42 0 

Qy 421 ACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 48 0 

I I I I I I I I If I I I I I I I I I I I I I I I I I I I I I I M I I I I II II I I I I I II I I I II I I I I I I 
Db 421 ACACTTTCTACAAAAGAAGG7VATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 48 0 



4 81 GAC CT T AGAAGT T C T AC C CAT G C T CACT T T CAT CAAT T CT GT CC CAAAAGAAGAG G GC AG 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II M I I I I I i I I M I I I I I I I I I I I I I 
4 81 GAC CT T AGAAGT T CT AC C C AT GC T CACT T T CAT CAAT T CT GT CC CAAAAGAAGAG G GCAG 54 0 

541 T AACT G CAT C GAC TAT GC AAGTT CT GGAAAC C CT GAAC AC AAT CT C AT TT AC AG C CT C T G 600 

I I I I I I I I I I I I! I I I I I I I I I I II I I I I I I I I I I I I M I I I 1 I I I I M I I II I I I I I I I 
541 TAACT GCAT CGACTAT GCAAGTT CT GGAAAC CCT GAAC AC AAT CT CATTT ACAGCCT CT G 600 

601 CCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGAT 660 
I I II I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I M I I 

601 CCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGAT 660 

661 GGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACC 72 0 
I I I I I I I I I I I I I I I II I I II I II I I I I I I I i I I I I I I I I I I I II I I I M I I I I I I I I I i 

661 GGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACC 72 0 

721 CCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATAT 78 0 

I I I I II I I I I I I II I I I I I I II I I I I I I I I M I I ! I I I I I I I I I I I I I I I I I I I II I I II 
721 CCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATAT 780 

7 81 CAT GC G CAATT T GAGGAT C GCC T C AC GC CT G GAT AGT T GGC CACAAG GAT GT ACAC AGAA 84 0 

I I I I I I I I I I I II I I I I I I I I I I I I I I I i I I I I I ! I I I I I I I I I I II I I I II I II I I I I I 
7 81 CAT GC G CAAT T T GAGGAT C GC C T C ACGC CT G GAT AGT T GG C CACAAG GAT GT AC AC AGAA 84 0 

841 GGC CAT CAAAT C T AT AT ACACACT GACAC GGCCT CTGGCCTTTCT GAAC AGT G C CAT CAA 900 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I M I I I I I I I II 
841 GGC CAT CAAAT C TAT AT ACAC ACT GACAC GG CCT CTGGCCTTTCT GAAC AGT G C CAT CAA 900 

901 T C C CAT CT T CT ACT T CCT CAT GGGAGAC CAT T ACAGAGAGAT GCT GAT TAGT AAGT T CAG 960 

I i I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I II I 
901 T C C CAT CTT CT ACT T CCT CAT GG GAGAC CAT T ACAGAGAGAT GCT GAT TAGT AAGTT CAG 960 

961 ACAAT ACTT CAAGT C CCT T AC AT C CTT CAG GAC AT GAGCT GCT GGAT GCAGGT CTT CACT 1020 

I I I I I I M I I I I I I I I I I I I I I I I I I i II I I I I II I I I M I I I I I II I I I II I I I I I I I I 
961 ACAATACT T CAAGT C CCT T AC AT CCT T CAG GAC AT GAGCT GCT GGAT GCAGGT CTT CACT 1020 

1021 C AGCCAAAAT GAGAC ACT T GAT AAACAGT GC T GT GC AGT T GAGT T T T AAC T AAGT AAAC C 10 8 0 

I I I I I.I M I I II I I I I I I I I I Tl I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I 

1021 C AGCCAAAAT GAGAC ACT T GAT AAACAGT GCT GT GC AGT T GAGT T T T AAC T AAGT AAAC C 108 0 

1081 ACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTG 114 0 
I 1 I I I I ! I M I I I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I I I II I I I I I I I I I I I I 

1081 ACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTG 114 0 

1141 G GT C C AC AT G AAT C AGAAG G C AG CT C T CT GT T CT GAT T T TAG GT TAT AC C CAG AGT AT GG 12 00 

I I I I I I I I I I I I I I I I I I i I I I II I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I 
1141 G GT C C AC AT GAAT C AGAAG G CAG CT CT CT GT T CT GAT T T TAG GT T AT ACC C AGAGT AT G G 12 00 

12 01 AAAAAAT AAG GC AT GAGAAAGCAT T GAC AT CTT C ACT T AAGAACT GAACAAAAGAGAAC A 1260 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II i I I I I I I 

12 01 AAAAAAT AAGGCAT GAGAAAGCATTGACAT CTT CACTTAAGAACT GAACAAAAGAGAACA 12 60 

12 61 AAT AT T GT CAAT GT T T GGACACT T AGGAT CT GAAAT C TT G GAAAT T TT AAG AC CT CT T T T 132 0 
I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I II I I ! I I I I I I I I II I M I I II I I I I 

1261 AAT AT T GT CAAT GT TT GGACACT T AGGAT C T GAAAT C TT G GAAAT T T T AAGAC C T C T T T T 1320 



Qy 

Db 



1321 
1321 



T CT AT CAGT GT AAAAG GAAT ACAAGAT AG C T AGT T GCAAAT G CT GAAT GC AT T T CAT CAT 138 0 

I I I I I I I I I f I I I I M M I I I I I I I t I M I I II I I I I I I I I I I I I I I I I I I I I I I I I i I I 

T C TAT CAGT GT AAAAG G AAT ACAAGAT AG CT AGT T G C AAAT G CT GAAT GC AT T T CAT CAT 138 0 



Qy 



Db 



13 81 TGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTGTAATATTAAAA 144 0 

M I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
1381 TGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTGTAATATTAAAA 14 4 0 



Db 



14 41 T T TAT GT GAAAAAT GAAT AT AAT T CAAT GT ACAAGAT T AGAT T T T CT AT T T GAAAATT AT 1500 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I M I I I II I I II I I I I I I I I II 
14 41 TTTAT GT GAAAAAT GAAT AT AAT T CAAT GT ACAACATTAGATTTT CTATT T GAAAATTAT 1500 



Qy 



Db 




RESULT 2 



US-10-272-983-35 

; Sequence 35, Application US/10272983 
; Publication No. US20030148450A1 
; GENERAL INFORMATION: 

APPLICANT: Chen, Ruoping 
; APPLICANT: Dang, Huong T. 
; APPLICANT: Liaw, Chen W. 
; APPLICANT: Lin, I-Lin 

TITLE OF INVENTION: Human Orphan G Protein Coupled Receptors 

FILE REFERENCE: AREN0050 
; CURRENT APPLICATION NUMBER: US/ 10/272 , 983 
; CURRENT FILING DATE: 2002-10-17 

PRIOR APPLICATION NUMBER: US/ 09/4 17 , 04 4 

PRIOR FILING DATE: 1999-10-12 
; PRIOR APPLICATION NUMBER: 60/109,213 

PRIOR FILING DATE: 1998-11-20 

PRIOR APPLICATION NUMBER: 60/120,416 
; PRIOR FILING DATE: 1999-02-16 
; PRIOR APPLICATION NUMBER: 60/121,851 

PRIOR FILING DATE: 1999-02-26 

PRIOR APPLICATION NUMBER: 60/123,946 
; PRIOR FILING DATE: 1999-03-12 
; PRIOR APPLICATION NUMBER: 60/123,949 
; PRIOR FILING DATE: 1999-03-12 
; PRIOR APPLICATION NUMBER: 60/136,436 

PRIOR FILING DATE: 1999-05-28 

PRIOR APPLICATION NUMBER: 60/136,437 
; PRIOR FILING DATE: 1999-05-28 
; PRIOR APPLICATION NUMBER: 60/136,439 
; PRIOR FILING DATE: 1999-05-28 
; PRIOR APPLICATION NUMBER: 60/136,567 
; PRIOR FILING DATE: 1999-05-28 

; Remaining Prior Application data removed - See File Wrapper or PALM. 

; NUMBER OF SEQ ID NOS : 74 

; SOFTWARE: Patent In Ver. 2.1 

; SEQ ID NO 35 

LENGTH: 1005 

TYPE: DNA 

ORGANISM: Homo sapiens 



US-10-272-983-35 



Query Match 38.4%; Score 592.4; DB 13; Length 1005; 

Best Local Similarity 75.5%; Pred. No. 2.1e-138; 

Matches 750; Conservative 0; Mismatches 241; Indels 3; Gaps 1; 

Qy 39 GC AGAAT GG C ACAGAATT T AT CT T GT GAGAAT T G GT T G GCAACAGAG GCT AT CTT GAAT A 98 

II I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 
Db 8 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 67 

Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

MINIMI! M I I I I I II I I I I I I I I I I I I I I I I I I I I II I II 

Db 68 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 127 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

Ml II I I M I I I I I IE I I I I I M I I I I M I I I I I I I 1 I III I I I M I I 
Db 128 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCT 187 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I M M I II I I I I I I M I I Mill II I I I I I I M I Mill I I II I I I Mill 

Db 18 8 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 247 

Qy 279 AT GC CAAT GAT AAGG G GAC CT AT GGAGAT GT T CT CT GT AT AAGCAAC C GAT AT GT G CTT C 338 

I M I I I I I I II III I I I I I I I I II I M I I I I M I II I ! I I I M I I I I I I M 

Db 24 8 AT GC CAAT GGAAACT GGAT AT AT GGAGAC GT GCT CT G CAT AAGCAAC C GAT AT GT G CTT C 307 

Qy 339 AC AC CAAC CT CT ACAC C AGCAT C CT CT T C CT CACTT T CATT AG CAT GGAC C GAT AT CT G C 398 

I I I I I I II I M M I I I II I Mill MINIM II I I I I I II I I II I II 

Db 308 AT GC CAAC C T C TAT AC C AGCAT T C T CT T T CT C ACTTT TAT CAG CAT AGAT C GAT ACT T GA 367 

Qy 399 T CAT GAAGT AC CCT T T C C GAGAAC ACT T T CT ACAAAAGAAG GAAT T T GCC AT T T TAAT C T 458 

I N INN I N I II I I I II I I I I II II I II II I I I II I I I II I I I i I I ! I I I 
Db 368 TAAT T AAGT AT CCT T T C C GAGAAC AC C T T CT G CAAAAGAAAGAGT T T G CT AT T T TAAT C T 427 

Qy 4 59 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I II I I II I I I I I I N I I I I I N II N II I I II II I 
Db 428 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 4 87 

Qy 519 CTGTCCCAAAAGAAGAGGGCAGTAACTGCATCGACTATGCAAGTTCTGGAAACCCTGAAC 578 

I II I II II I II II I I I I I II I I II II I II II I I I I I II I 

Db 4 88 CT GT T AT AAC T GACAAT G GCAC C AC CT GT AAT GATT T T GCAAGTT C T GGAGAC C C C AACT 547 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

II I I I I I I II I I I I I I I II I I II I I I I 1 I t I I I I I II I II I II I II I if 

Db 54 8 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 607 

Qy 639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

II I I N II I II I I I II II I I I N I I I N I I II I IE I II I III 

Db 608 T GT GT T T CT T T TAT TAC AAGAT T G C T CT C TT C CT AAAG CAGAGGAAT AGGC AG GT T GCT A 667 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

II II I I I N ! I I I i I II I I I I II I I II II f I N | | | | | N I I 

Db 668 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 727 

Qy 7 59 TAC T C T T C AC AC C CT AT CAT AT CAT GC GCAAT TT GAG GAT C G C CT CAC GC C T GGAT AGT T 818 

I N II I II I I I I I I I I I I I I I N ill I N II I I I I I I II II I II I I I I I I 
Db 728 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 7 87 



Qy 819 G GCCACAAGGAT GT ACACAGAAGGCCAT CAAAT CTAT AT ACACACT GACACGGC CT C 87 5 

I II I I I I I I I I I I I I I 1 I I I I I I I I I I I I II I I I I I 

Db 78 8 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 8 47 

Qy 876 TGGCCTTT CT GAAC AGT GC CAT CAAT C C CAT C T T CT AC TT C C T CAT GGGAGAC CAT T AC A 935 

I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I II I II 

Db 84 8 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 907 

Qy 936 GAGAGAT G CT GAT T AGT/^AGT TC AGACAAT ACT T CAAGT C C CTT ACAT C CT T C AG GAC AT 995 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I t I M I I I II 
Db 908 GGGAC AT GCT GAT G7\AT CAAC T GAGACACAACT T CAAAT C C CTT ACAT C CTT TAG C AGAT 967 

Qy 996 GAG CT G CT GGAT GC AG GT CTT C AC T CAGC C AAAA 102 9 

I I I I III I I I I I I I I I I I I I 

Db 968 GG G C T CAT GAACT C CT AC T T T CAT T CAGAGAAAA 1001 



RESULT 3 

US-10-393-807-35 

; Sequence 35, Application US/10393807 

; Publication No. US2 00301758 91A1 

; GENERAL INFORMATION: 

; APPLICANT: Chen, Ruoping 

APPLICANT: Dang, Huong T. 
; APPLICANT: Liaw, Chen W. 
; APPLICANT: Lin, I-Lin 

; TITLE OF INVENTION: Human Orphan G Protein Coupled Receptors 
FILE REFERENCE: AREN0050 

CURRENT APPLICATION NUMBER: US/10/393, 807 

CURRENT FILING DATE:. 2003-03-21 
; PRIOR APPLICATION NUMBER: US/09/417,044 

PRIOR FILING DATE: 1999-10-12 

PRIOR APPLICATION NUMBER: 60/109,213 
; PRIOR FILING DATE: 1998-11-20 

PRIOR APPLICATION NUMBER: 60/120,416 

PRIOR FILING DATE: 1999-02-16 
; PRIOR APPLICATION NUMBER: 60/121,851 

PRIOR FILING DATE: 1999-02-26 

PRIOR APPLICATION NUMBER: 60/123,946 
; PRIOR FILING DATE: 1999-03-12 
; PRIOR APPLICATION NUMBER: 60/123,949 

PRIOR FILING DATE: 1999-03-12 
; PRIOR APPLICATION NUMBER: 60/136,436 

PRIOR FILING DATE: 1999-05-28 

PRIOR APPLICATION NUMBER: 60/136,437 
; PRIOR FILING DATE: 1999-05-28 
; PRIOR APPLICATION NUMBER: 60/136,439 
; PRIOR FILING DATE: 1999-05-28 

PRIOR APPLICATION NUMBER: 60/136,567 

PRIOR FILING DATE: 1999-05-28 
; Remaining Prior Application data removed - See File Wrapper or PALM. 
; NUMBER OF SEQ ID NOS : 74 

SOFTWARE: PatentlnVer. 2.1 
; SEQ ID NO 35 

LENGTH: 1005 
TYPE: DNA 



ORGANISM: Homo sapiens 
US-10-393-807-35 



Query Match 38.4%; Score 592.4; DB 13; Length 1005; 

Best Local Similarity 75.5%; Pred. No. 2.1e-138; 

Matches 750; Conservative 0; Mismatches 241; Indels 3; Gaps 1; 

Qy 3 9 GC AGAAT GGC AC AGAAT T T AT CT T GT GAGAAT T GGT T GGCAAC AGAG G C TAT C T T GAAT A 98 

II MINI I I I I I I II I I I I I I I I I I I I I I I I I I II Mil l 
Db 8 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 67 

Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

I I M II I I II If I I I I I I I II I I I I I I I I I I II I I I I I I I I I II 

Db 68 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 127 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

I I I M I I I I I I I I Mill! I I I II I I I I I I I I I I I I I I Ml I INN I 
Db 12 8 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCT 187 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 278 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I Mill. 
Db 18 8 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 247 

Qy 27 9 AT GC CAAT GAT AAGGGGAC CT AT G GAGAT GT T CT C T GT AT AAGCAACC G AT AT GT GC T T C 338 

M I II I I I I II III I II I I I I I II Mill II M I I I I I I II I II II I II II 
Db 24 8 AT GCCAAT GGAAACT GGATATAT GGAGAC GT GCT CT GCATAAGCAACC GATAT GT GCTT C 307 

Qy ■ 33 9 ACACCAACCTCTACACCAGCATCCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGC 398 

I I M II M I M II I II I I I I II II . II I II II I M Mill II I II I I II 
Db 308 ATGCCAACCTCTATACCAGCATTCTCTTTCTCACTTTTATCAGCATAGATCGATACTTGA 367 

Qy 39 9 T CAT GAAGT AC C CTT T CC GAGAAC ACT T T CT AC AAAAGAAGGAAT T T G C CAT T T T AAT CT 4 58 

I M IMM I II I II I II II I I II MM II II M I I II I II IE I I I II I I I II 
Db 3 68 TAAT T AAGT AT C CT T T CC GAGAACAC CTT CT G CAAAAGAAAGAGT T T GCT AT T T T AAT CT 427 

Qy 4 59 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I MM II III II II I I II I I I II I M II I I II I I I I I I | || 
Db 42 8 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 4 87 

Qy 519 CTGTCCCA7WVGAAGAGGGCAGTAACTGCATCGACTATGCAAGTTCTGGAAACCCTGAAC 57 8 

MM II II I II II Mill M I II I I II II I I II II I M I 
Db 4 88 CT GTT AT AACT GAC AAT G G CAC CAC C T GT AAT GAT TT T GCAAGT T CT G GAGAC C C C AAC T 54 7 

Qy 57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 63 8 

I I I I I I I I M I M I I M II II II M II II II II II II II I I | || | || | i 
Db 54 8 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 607 

Qy 639 TGT GCTT CTT CTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

I M M I M II I M I II II M II II I I II I I ' I I M I II I I III 

Db 608 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 667 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 75 8 

I M M II I II II II II II Mill II I I II M I II I I II I I I I 

Db 668 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 727 

Qy 7 59 TAC T CT T CAC AC C CT AT CAT AT CAT G C GCAAT T T GAG GAT C GC C T CAC G C CT G GAT AGT T 818 

I II II II I II I I II II I I M I II III I II I II I I I I II II II I I I I I II I 



Db 



72 8 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 78 7 



Qy 819 G GC CACAAGGAT GTACAC AGAAGGC CAT CAAAT CTATATACACACT GACACGGCCT C 87 5 

I II I I I I I I I I I I I I I I I M Mill I I I I II I I I I I 

Db 7 88 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 847 



Qy 87 6 TGGCCTTTCTGAACAGTGCCATCAATCCCATCTTCTACTTCCTCATGGGAGACCATTACA 935 

I I I I I I I I I I II I I I I I I I II I I I I I M I I I I I II II I M I I I I II I II 
Db 848 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 907 

Qy 936 GAGAGAT GC T GATT AGT AAGT T C AGAC AAT ACT T C AAGT C C CT TAC AT C CT T CAG GAC AT 995 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II 
Db 908 G G GAC AT GC T GAT GAAT CAAC T GAGAC AC AACT T CAAAT C C CT T AC AT C CT T TAG C AGAT 967 



Qy 9 96 GAGCTGCTGGATGCAGGTCTTCACTCAGCCAAAA 1029 

I I I I III I I I I I I I I I I I I I 

Db 9 68 GG GC T CAT GAACT C CT ACT T T CAT T C AGAGAAAA 1001 



RESULT 4 

US-10-225-567A-566 

; Sequence 566, Application US/10225567A 
; Publication No. US2 0 0301137 98A1 
; GENERAL INFORMATION: 
; APPLICANT: Lifespan Biosciences 
; APPLICANT: Brown, Joseph P. 
; APPLICANT: Burmer, Glenna C. 
APPLICANT: Roush, Christine L. 

TITLE OF INVENTION: ANTIGENIC PEPTIDES AND ANTIBODIES FOR G PROTEIN- COUPLED 
RECEPTORS (GPCRS) 

FILE REFERENCE: 1920-4-4 
; CURRENT APPLICATION NUMBER: US/ 10/225 , 567A 
; CURRENT FILING DATE: 2001-12-19 

PRIOR APPLICATION NUMBER: 60/257,144 
; PRIOR FILING DATE: 2000-12-19 
; NUMBER OF SEQ ID NOS : 2292 
; SOFTWARE: Patentln version 3.1 
; SEQ ID NO 566 
; LENGTH: 1380 

TYPE: DNA 
; ORGANISM: Homo sapiens 
US-10-225-567A-566 

Query Match 38.4%; Score 592.4; DB 15; Length 1380; 

Best Local Similarity 75.3%; Pred. No. 2.6e-138; 

Matches 764; Conservative 0; Mismatches 246; Indels 4; Gaps 2; 
Qy 39 G CAG AAT GG CACAGAAT TT AT CT T GT GAGAAT T GGT T GG CAAC AGAGGC TAT C TT GAAT A 98 

II MINI I I I 1 I I I M I I I I I I M I I I I I I I II I I Mill 

Db 50 GGAT CAT G G CAT G GAAT GCAACT T G C AAAAAC T GG CT GG CAGC AGAGGC T GC C C T G GAAA 109 



Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I II M II I I I I I I I I I 

Db 110 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 169 



Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I II I I I MINI I I I I I I I I I I I I I I II I I I III I I I I I I I 



Db 17 0 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAA.CTGG7\ACAGCAGTAATATTTATCTCT 22 9 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I I I I I I II I! I I I I I I I I I I I I I I I I I II I I I I I I I I I! I I I I I I I I I I I I 
Db 230 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 289 

Qy 27 9 AT GC CAAT GAT AAG G GGAC CT AT G GAGAT GTT C T C T GT AT AAGC AAC C GAT AT GT G CT T C 338 

MINIMI II III I II I I I I I II I I I I I I M M I II I I I II I M I I I I II 

Db 290 AT GC C AAT GGAAACT GGAT AT AT GGAGAC GT G CT CT GC AT AAG C AAC C GAT AT GT G C T T C 34 9 

Qy 33 9 AC AC CAAC C T CT ACAC CAGC AT C CT CT T C C TCAC T TT CAT T AGC AT GGAC C GAT AT CT G C 398 

I I! II I I II I I I II I I I II I II II II I M M I II I I I I I II Mill II 
Db 350 AT G C CAAC C T C TAT AC C AG CAT TCTCTTTCT C AC T T T TAT C AG CAT AG AT C GAT AC T T G A 4 09 

Qy 399 T CAT G AAGT AC C C T T T C C GAG AAC AC T T T CT AC AAAAGAAGG AAT T T G C CAT T T T AAT CT 458 

I II I I M I II I I I I I I II I I I I I M M I I I I I I II II Mill II I II I I I I I 
Db 410 TAATTAAGTATCCTTTCCGAGAACACCTTCTGCAAAAGAAAGAGTTTGCTATTTTAATCT 469 

Qy 45 9 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I II II I I II I I I II I II I II I II I II II I II I I I I (Mill 

Db 470 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 52 9 

Qy 519 C T GT C C CAAAAGAAGAG GGCAGT AACT GC AT C GAC TAT G CAAGT T CT G GAAAC C C T GAAC 57 8 

MM M II I I I II Mill I I I I I I I II I II I II I I M I I 
Db 530 CT GTT AT AACT GACAATGGCACCACCTGT7VATGATTTTGCAAGTTCT GGAGAC CCCAACT 589 

Qy 57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

II I I II I I II II II I I I M II II II M II I I I I II I I II I I II I I I II I 

Db 590 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 64 9 

Qy 639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

M M M I I I I 1 M I I I II I I I I II I II II I II I II MM III 

Db 650 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 709 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I M I I I I II II I I M II I I M M II II I I II I I II II I I II I 

Db 710 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 769 

Qy 759 TACT CTTCACACCCTATCATAT CAT GCGCAATTTGAGGATCGCCTCACGCCT GGAT AGTT 818 

I II II I II II II II I I II I M II III I I I I I I I I II II I I I I II I I I I I I 
Db 77 0 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 82 9 

Qy 819 G G C CACAAGGAT GT ACAC AGAAGGC CAT CAAAT C TAT AT ACAC ACT GAC AC GGC C T C 875 

I II I I I I I M I I I II I M II I II I I II II I II I I M 

Db 830 G GAAG CAGT AT CAGT GCACT C AGGT C GT CATC AAC T C C T T TT ACATT GT GAC AC GGC C T T 889 

Qy 87 6 TGGCCTTT CT GAACAGT G C CAT CAAT C C C AT CT T C T ACT T CCT C AT GG GAG AC CATT AC A 935 

M I II II II I M II I M I II I I II II II II I II II II I II 1 I II II I II 
Db 890 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 949 

Qy 936 GAGAGAT G CT GAT T AGT AAGT T CAGAC AAT AC T T CAAGT C CCT T ACAT C C TT CAGGACAT 995 

I II I I II II II I I I I II II I II I I I II II M I II I II II I I II I I I 
Db 950 G G GACAT G CT GAT GAAT CAAC T GAGAC ACAAC T T CAAAT C CCT T ACAT C C TTT AGCAGAT 1009 

Qy 996 GAG CT G C T G GAT G C AGGT CTT C AC T C AG C C AAAA- T GAGACACT T GAT AAAC AG 104 8 

I III II I I II M I I II M I I II II MM II I II I 

Db 1010 GGGCTCATGAACTCCTACTTTCATTCAGAGAAAAGTGAGGGGCTTGTGAAACAG 1063 



RESULT 5 

US-09-764-886-36 

Sequence 36, Application US/09764886 
Publication No. US20030139327A9 
GENERAL INFORMATION: 
APPLICANT: Rosen et al . 

TITLE OF INVENTION: Nucleic Acids, Proteins, and Antibodies 
FILE REFERENCE: PTZ02 

CURRENT APPLICATION NUMBER: US/ 09/7 64, 8 86 
CURRENT FILING DATE: 2001-01-17 

Prior application data removed - consult PALM or file wrapper 
NUMBER OF SEQ ID NOS : 88 
SOFTWARE: Patentln Ver . 2.0 
SEQ ID NO 36 
LENGTH: 14 36 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-09-764-886-36 

Query Match 38.4%; Score 592.4; DB 13; Length 1436; 

Best Local Similarity 75.3%; Pred. No. 2.6e-138; 

Matches 764; Conservative 0; Mismatches 246; Indels 4; Gaps 2; 
Qy 39 GC AGAAT G GCACAGAATT T AT CT T GT GAGAAT T GGTT GG CAAC AGAGGC TAT CT T GAAT A 9 8 

I I MINI I I I I I I I I I I II III I I I I I I I I I I I I I I II I j 

Db 100 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 159 

Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

I I I I I I i I I I I I I I I I I I I I I I I i i I I II I I I I I I I I I I I I I II 

Db 160 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 219 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

Ml II I I I I I I I I MINI I I I I I I M II I I I I I 11 I I III I I I I I I I 
Db 22 0 T T GT T GT T T AC GGCTAC AT CTTCTCTCT GAAGAAC T GGAAC AG C AGT AAT AT T TAT C T C T 2 79 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I (Mil I I I I I I I I I I I I 
Db 2 80 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 339 

Qy 279 AT G C C AAT GAT AAG GG GAC C TAT G GAGAT GTT CT CT GT AT AAGCAAC C GAT AT GT G CTT C 338 

I I M I I I I I II III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 34 0 AT G C CAAT G GAAAC T GGAT AT AT G G AGAC GT G C T C T GC ATAAGCAACC GAT AT GT GCT T C 3 99 

Qy 339 AC AC CAAC C T C T AC AC C AG CAT CCTCTTCCT C ACT TT CAT T AGC AT GGAC C GAT AT C T GC 3 98 

I I I I I I I I M I I I I I I I I I Mill I I M II II II I I I I I II I I M I II 
Db 4 00 ATGCCAACCTCTATACCAGCATTCTCTTTCTCACTTTTATCAGCATAGATCGATACTTGA 4 59 

Qy 399 T CAT GAAGT AC C C T T T C C GAGAAC ACT T T CT ACAAAAGAAGGAAT T T G C CAT T T T AAT CT 4 58 

I M Mil! I I I I I I M II I 1 I II I I I I I M I M I I II I II II M I M I I M I 

Db 4 60 T AATT AAGT AT C CT T T C C GAGAAC AC C T T CT GC/\AAAGAAAGAGT T T G CT AT T T T AAT C T 519 



Qy 

Db 



459 
520 



518 



579 



Qy 519 CTGTCCCAAAAGAAGAGGGCAGTAACTGCATCGACTATGCAAGTTCTGGAAACCCTGAAC 578 

I I I I II I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I I 
Db 58 0 C T GT T ATAACT GAC AAT GG CACC AC CT GTAAT GAT T T T GCAAGT T C T GGAGAC C C CAACT 63 9 

Qy 57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 63 8 

I I I I I I I III I I I I I I I II II II I I I I I | | | | | M I I | I | I I I I II I I I 
Db 64 0 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 699 

Qy 63 9 TGTGCTTCTTCTACTAC7VAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 69 8 

Mill I I I I I ! I I I I I II I I I I I I III 

Db 7 00 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 759 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 75 8 

I I I I I I I M I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I 

Db 760 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 819 

Qy 759 T AC T CT T CAC AC CCT AT CAT AT CAT GC GCAAT T T GAG GAT C GCC T C AC G C C T GGAT AGT T 818 

I M M M I I I I I I I I I I I M I I I III I I I I I I I I I I I I I I I I I I I i I I I I 
Db 82 0 T G CT T T T TACAC CC T AT CAC GT C AT GC G GAAT GT GAGGAT C G CT T C AC G C CT GG G GAGT T 87 9 

Qy 819 G GC CACAAGGAT GTACACAGAAGGCCAT CAAAT CTATATACACACT GACACGGC CT C 875 

I M I II M III I I I I I I I I I I I I I I I I I I I I I I I i I 

Db 880 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 939 

Qy 87 6 TGGCCTTTCTGAACAGTGCCATCAATCCCATCTTCTACTTCCTCATGGGAGACCATTACA 935 

I II I I I I I II I I M I I I I I I I I I I II I II I I I I I I II I I I I II I II I II 
Db 94 0 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 999 

Qy 936 GAGAGAT GCT GATTAGTAAGTT C AGACAATACTTCAAGT C CCT T ACAT C CTT CAGGACAT 995 

I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I M I M I I I I I I I | | || 
Db 1000 G GGACAT G CT GAT GAAT CAACT G AGACAC AAC T T CAAAT C C CT T AC AT C C TT T AG CAGAT 1059 

Qy 9 96 GAG C T G CT G GAT GCAG GT CTT CAC T C AG C CAAAA- T GAGACACT T GATT^AAC AG 1048 

I I I I Ml I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1060 GGGCTCATGAACTCCTACTTTCATTCAGAGAAAAGTGAGGGGCTTGTGAAACAG 1113 



RESULT 6 

US-09-764-886-11 

Sequence 11, Application US/09764886 
Publication No. US20030139327A9 
GENERAL INFORMATION: 
APPLICANT: Rosen et al . 

TITLE OF INVENTION: Nucleic Acids, Proteins, and Antibodies 
FILE REFERENCE: PTZ02 

CURRENT APPLICATION NUMBER: US/ 09/7 64, 886 
CURRENT FILING DATE: 2001-01-17 

Prior application data removed - consult PALM or file wrapper 
NUMBER OF SEQ ID NOS : 88 
SOFTWARE: Patentln Ver . 2.0 
SEQ ID NO 11 
LENGTH: 4232 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-09-764-886-11 



Query Match 



38.4%; Score 592.4; DB 13; Length 4232; 



Best Local Similarity 75.3%; Pred. No. 5.1e-138; 

Matches 764; Conservative 0; Mismatches 246; Indels 4; Gaps 2; 



Qy 39 GCAGAAT GGCACAGAATT TAT CTT GTGAGAATT GGT T GGCAACAGAGGCTAT CTT GAATA 98 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 110 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 169 

Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

M I I I I I I I I I I I II I I I I I I I I I I | I | | | | M I I II I I M I II 

Db 170 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 22 9 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATG7VAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

Ml M I I II It I I I I I I I I II I I I I M I [ I I | | | I | | | Ml | | | | | | | 
Db 230 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCT 28 9 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

IMMM II I I I I I I I I I I I I I I I I i I i I I I I I I I I I I I I I I I I I I I I I I I | 

Db 2 90 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 34 9 

Qy 279 AT GC CAAT GATAAG G G GAC CT AT G GAGAT GT T CT CT GT AT AAGC AAC C GAT AT GT GCTT C 338 

I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 350 AT GC CAATGGAAACT GGATATAT GGAGAC GT GCT CT GCATAAGCAAC CGATATGT GCTT C 409 

Qy 339 ACAC CAAC C T CT ACAC CAGC AT CCTCTTCCT CACTT T CAT TAG CAT G GAC C GAT AT C TGC 39 8 

I I I II I I I I I I I I i I I I I I I I I I I | I | i | | | | || | | | | | || | | | | | || 
Db 410 AT GC CAAC CT C TAT AC CAGC AT TCTCTTTCT CAC TT T T AT C AGCATAG AT C GAT ACT T GA 4 69 

Qy 399 T CAT GAAGT AC CCT T T C C GAGAACACT T T CT AC7YAAAGAAGGAAT T T G CCAT T T T AAT C T 458 

I M I I I I I I I I I I I I I I I I I I I I I I I I MINIM II I I I M I I I I I I I I II 
Db 470 TAATTAAGTATCCTTTCCGAGAACACCTTCTGCAAAAGAAAGAGTTTGCTATTTTAATCT 52 9 

Qy 459 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I M I I I I I I I I M II I II M I I I I II II I I I I I I II I II 
Db 530 C CTT GG C CAT T T G GGT T T T AGT AAC C T T AGAGT T AC T AC C CAT AC T T C C C CT T AT AAAT C 58 9 

Qy 519 CT GT C C CAAAAGAAGAGGG CAGT AACT GCAT C GACT AT G CAAGTT CT GGAAAC CCT GAAC 578 

I I I I M II I I I II I II I I I ! I II I II II I II II I I I II I 

Db 590 C T GT TAT AACT GAC7VAT G G CAC CAC CT GT AAT GAT T T T GCAAGTT CT G GAGACC C CAACT 64 9 

Qy 57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

MM II II I II II I I I i M II II I I I I I I I Mill i I I M II I I Mill 
Db 650 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 709 

Qy 639 TGT GCTT CTT CT ACT ACAAGATGGT AGT CTT CTT AAAGAGGAGGAGCCAGCAGCAAGCAA 698 

i M I I M II I f M M I II I I I II II II I II I I II I I II I Ml 

Db 710 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 769 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I M M I M II II II I I II I I II II I I I I II I II | | || || 

Db 77 0 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 829 

Qy 759 T AC T C T T CAC AC C C TAT CAT AT CAT GC GCAAT T T GAG GAT C GC CT CAC GC CT GGAT AGT T 818 

I M II I I II II II II I II I I II I I I I M I I I II 1 II I I I II II I II II II 

Db 830 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 88 9 

Qy 819 G GCCACAAGGATGTACACAGAAGGCCATCAAATCTATATACACACTGACACGGCCTC 87 5 

I II I . I I I I I I! I I II I II I I . I I I M I I I II I I I M I 



Db 



890 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 949 



Qy 87 6 TGGCCTTTCTGAACAGTGCCATCAATCCCATCTTCTACTTCCTCATGGGAGACCATTACA 935 

i I I I I I I M M I I I I I I I | | | | | | || | M I I I I II II I I I I II I II I II 
Db 950 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 1009 



Qy 936 GAGAGAT GCT GAT T AGT AAGT T C AGACAAT AC TT CAAGT C C C T T AC AT C CT T CAGGACAT 995 

I 1 i I I I I I I M I I I I Mill II I I I I I I I M I I I I I I I I I I I I I I I 
Db 1010 G GGACAT GC T GAT GAAT CAACT GAGAC ACAACT T CAAAT C C CT T AC AT C CT T T AGCAGAT 1069 



Qy 996 GAGC T G CT GGAT G CAGGT C T T CAC T C AGC CAAAA- T GAGAC AC T T G AT AAAC AG 104 8 

I I I I III I I I I I I II I I I I I III! I II I I! I I I I 

Db 107 0 GGG CT CAT GAACT C CT ACT T T CAT T C AGAGAAAAGT GAGG G GC T T GT GAAAC AG 1123 



RESULT 7 
US-10-270-587-1 

/Sequence 1, Application US/10270587 
; Publication No. US20030054487A1 
; GENERAL INFORMATION: 
APPLICANT: Li, Yi 

TITLE OF INVENTION: Human G-Protein Coupled Receptor 
; FILE REFERENCE: PF217C2 

; CURRENT APPLICATION NUMBER: US/ 1 0/27 0 , 587 
; CURRENT FILING DATE: 2002-10-16 

PRIOR APPLICATION NUMBER: US 09/908,593 

PRIOR FILING DATE: 2001-07-20 

PRIOR APPLICATION NUMBER: US 08/781,456 

PRIOR FILING DATE: 1997-01-10 

PRIOR APPLICATION NUMBER: US 60/009,902 

PRIOR FILING DATE: 1996-01-11 

NUMBER OF SEQ ID NOS : 9 

SOFTWARE: Patentln version 3.1 
; SEQ ID NO 1 

LENGTH: 142 8 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-10-270-587-1 

Query Match * 38.3%; Score 590.8; DB 15; Length 1428; 

Best Local Similarity 75.2%; Pred. No. 6.7e-138; 

Matches 7 63; Conservative 0; Mismatches 247; Indels 4; Gaps 2; 

Qy 3 9 G C AGAAT GGCACAGAAT T TAT CT T GT GAGAAT T GGT T G GCAAC AGAGGC TAT C T T GAAT A 98 

II I I I I I I I I I I I Mil I I I I I I I I I I I I I I I I I I I I I I I I 
Db 99 G GAT CAT G G CAT GGAAT G CAACT T GCAAAAAC T GG CT G GCAG C AGAGGC T G C C CT G GAAA 158 



Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

I I I I I I I I I I II I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 159 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 218 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGG7VACAGCAGCAATGTCTATCTTT 218 

III II I I I I I M I MINI II I II I I II I I I I I I I I II III I I I I II I 
Db 219 T T GTT GT T T AC G GCT AC AT CTTCTCTCT GAAGAAC T GGAAC AGC AGT AAT AT T TAT C T CT 27 8 



Qy 



219 



TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 
I I I I I I I II II II I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 279 TT/^ACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 338 

Qy 27 9 AT G C CAAT GAT AAGGGGAC CT AT GGAGAT GT T CT CT GT AT AAGCAAC C GAT AT GT GC T T C 338 

I I I I I I I I I II III I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 339 AT G C CAAT G GAAACT GGAT AT AT G GAGAC GT GCT CT GCAT AAG CAAC C GAT AT GTG C T T C 3 98 

Qy 33 9 AC ACC AAC CT CT AC AC CAGCAT C C T CT T C CT C ACT T T CAT TAG CAT GGAC C GAT AT CT GC 398 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I II I I I I I II 
Db 39 9 AT G CC AAC CT CT AT AC C AGC ATT C T CT T T CT C ACT T TT AT C AG CAT AGAT C GAT ACT T GA 458 

Qy 39 9 T CAT GAAGT AC C C T TT C C GAGAAC ACT T T CT ACAAAAGAAGGAAT TT GC CAT T T T AAT CT 458 

I II Mill I I I I I I I I I I I I I I I I I I I I I I I I I I I II I III I I I I I I I I I I 

Db 4 59 TAATT AAGT AT C C T TT C C GAGAAC AC CT T CT G CAAAAGAAAGAGT GT GCT AT T T T AAT C T 518 

Qy 4 59 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II III 
Db 519 CCTTGGCCATGTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 57 8 

Qy 519 CT GT C C C AAAAGAAGAGGG C AGT AAC T GCAT C GACT AT G CAAGTT CT GGAAAC C CT GAAC 57 8 

I I I I II II I I M I I III i I I I I I I I II I I M I I I I I I I I 
Db 579 CT GTT AT AACT GACAAT GG C AC CAC C T GT AAT GAT T T T GCAAGTT CT G GAGAC C CC AAC T 638 

Qy 57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I II I I I I I I I I I I I II II II I I I I t I I Mill I M I I II I I Mill 

Db 639 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 698 

Qy 639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

I II I II M I I I M I I II I I I I II II II II I Mill II I I III 

Db 699 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 758 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

II II 11 I II I I I I I I I I I I II I I I I I II I I I I I II I II II I I 

Db 7 59 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 818 

Qy 759 T ACT C T T C ACAC C CT AT CAT AT CAT GC GCAAT TT GAGGAT C GC CT C AC GC CT GGAT AGT T 818 

I II M II I I I 11 I I I I I I II II I III I M I II II II I II M I I M I II I I 

Db 819 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 87 8 

Qy 819 G GCCACAAGGATGT ACACAGAAGGC CAT CAAAT CT ATAT ACACACT GACACGGCCTC 87 5 

I II I I I M I II I II II I I I I Mill I I I II II I II I 

Db 879 GGAAGCAGT AT CAGTGCACTCAGGTCGT CATC AACT CCTTTTACATTGTGACACGGCCTG 938 

Qy 87 6 T GGCCT TT CT GAAC AGT GC CAT CAAT CC CAT CTT CT ACTT C CT CAT GGGAGACC ATT ACA 935 

I II I I II I I I I III M I I II I I II M II I I II I II II II I II M II I II 
Db 939 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTGTGGGAGATCACTTCA 998 

Qy 93 6 GAG AG AT GCT GAT T AGT AAGT T CAGAC AAT ACTT CAAGT C C CT TAC AT C CT T CAGGACAT 995 

I I I II I I II I I I I I I II I I I II M I M I I II I I I I I I I II I I I I II 
Db 999 G G GAC AT GCT GAT GAAT CAACT GAGAC ACAACTT CAAAT C C CT T AC AT C CT TTAGCAGAT 105 8 

Qy 996 GAG CT G CT G GAT GCAG GT C T T C ACT C AGC CAAAA- T GAGAC ACTT GAT AAAC AG 104 8 

I Ml III I II I I I I I I II I I II I I I I I I I I I I II 

Db 1059 GGGCTCATGAACTCCTACTTTCATTCAGAGAAAAGTGAGGGGCTTGTGAAACAG 1112 



RESULT 8 
US-09-943-798-3 



; Sequence 3, Application US/09943798 
; Patent. No. US200200652 15A1 
; GENERAL INFORMATION: 

APPLICANT: Glaxo Group Limited 
; TITLE OF INVENTION: Polypeptide 
; FILE REFERENCE: QG1021 

CURRENT APPLICATION NUMBER: US/09/943, 798 

CURRENT. FILING DATE: 2001-08-31 
; NUMBER OF SEQ ID NOS : 4 

SOFTWARE: FastSEQ for Windows Version 3.0 
; SEQ ID NO 3 

LENGTH: 1014 
TYPE: DNA 
; ORGANISM: Homo sapiens 
US-09-943-798-3 



Query Match 8.2%; Score 126.6; DB 9; Length 1014; 

Best Local Similarity 49.9%; Pred. No. 3e-21; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

I II I I Mill I II I I I I I I I I I I I I I I I I III 

Db 59 CTT T T GGAAAT T G C AC T GAT GAAAACAT C C CACT CAAGAT GC AC T AC C T C C CT GT T AT T T 118 

Qy 120 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 17 9 

III II I I I I I I I I I I I I I 11 I I I I I I I Mill 

Db 119 AT GGCAT TAT CTT C CT C GT G G GAT T T C CAGG C AAT GC AGT AGT GAT AT C CACTT AC AT T T 17 8 



Qy 180 TCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACT 2 39 

II II I I I II I I I II I I I I II II I I II I I II I I I I 

Db 179 T CAAAAT GAG AC C T T G GAAGAGCAG CAC CAT CAT TAT G CT GAAC CT GG C CT GC ACAGAT C 238 

Qy 240 TTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT GATAAGGGGA 296 

I I III I I I I I M I II I II M I II I I I II I I I II I I I 

Db 239 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCG7VAAACTGGA 298 



Qy 2 97 C C T AT GGAGAT GT T CT CT GT ATAAGCAAC C GAT AT GT G CT T C ACAC CAAC CT CT ACAC C A 356 

II II I I II I I I I II I I I II I III I II II I M I I I 

Db 2 99 TCTTTGGAGATTTCATGTGTAAGTTTATCCGCTTCAGCTTCCATTTCAACCTGTATAGCA 358 



Qy 357 G CAT CCT C T T C CT CACT T T CATT AGCAT GG AC C GAT AT CT GCTCAT GAAGT ACC C T T T C C 416 

II I I II I I II II II II I I I II I I I I I II III! I I I I I 

Db 359 GCAT CCT CTT CCT CACCTGTTTCAGCATCTTCCGCTACTGTGTGAT CATT CACCCAATGA 418 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 476 

I ! I II M I II I II I I I I II I I I I I I I 

Db 419 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 47 8 



Qy 4 77 TAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCCAAAAGAAGAGG 536 

I I I I I I I I I I i I I I I I M I I I I I I 

Db 47 9 T T T CAC T G GT AG C T GT CAT T C C GAT GAC C T T C T T GAT CAC AT CAAC CAAC AG G AC CAAC A 53 8 



Qy 537 GCAGTAACT GCAT C GAC TAT GCAAGT T CTGGAAACCCT GAACACAAT CT CATTTACAGCC 596 

I III II I II I II II I I I II I I I I I I I II I I 

Db 539 GAT CAGC C T GT CT C GAC CT CAC CAGT T C GG ATGAACTCAATACTATTAAGTGGT 592 



Qy 



597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 



I 1 1 1 II 1 1 1 1 I I 1 1 1 I 1 1 1 1 I II 

Db 593 ACAACCTGATTTTGACTGCAACTACTTTCTGCCTCCCCTTGGTGATAGTGACACTTTGCT 652 

Qy 657 AGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACA 716 

II I I I I II I I I I I I I I Ml 

Db 653 ATACCAC GATTAT C CACACT CT GAC CCAT GGACT GCAAACT GACAGCT GCCTTAAGCAGA 712 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 776 

I I I I I I I I I I I | || M | || || I II I I I 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 

Qy 777 AT AT CAT GCG C AAT TT GAG GAT C GC C T C AC GC CT G 811 

I I I I I I I I I I I I I I I I I I I I I I I 

Db 773 ATATCTTGAGGGTCATTCGGATCGAATCTCGCCTG 807 



RESULT 9 

US-09-885-453-2 

; Sequence 2, Application US/09885453 

; Publication No. US2003008808 0A1 

; GENERAL INFORMATION: 

; APPLICANT: Communi, Didier 

; TITLE OF INVENTION: RECEPTOR GPCRxlO 

FILE REFERENCE: 9409/2082 
; CURRENT APPLICATION NUMBER: US/09/885,453 
; CURRENT FILING DATE: 2001-06-20 

PRIOR APPLICATION NUMBER: US 09/885,453 
; PRIOR FILING DATE: 2001-06-21 
; NUMBER OF SEQ ID NOS: 12 
; SOFTWARE: Patentln version 3.1 
; SEQ ID NO 2 

LENGTH: 1014 

TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE: 

NAME/ KEY: DNA nucleotide sequence 
LOCATION: (1)..(1014) 

OTHER INFORMATION: GPCRxlO DNA sequence 
US-09-885-453-2 



Query Match 8.2%; Score 126.6; DB 11; Length 1014; 

Best Local Similarity 49.9%; Pred. No. 3e-21; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

I I I I I I I I I I I II I I 1 I I I I I I I I I I I II III 

Db 59 C T TT T GGAAAT T G C AC T GAT GAAAACAT C C CACT CAAGAT GCACT AC CT C C CT GT T ATT T 118 

Qy 12 0 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 17 9 

Ml II I I I I I I I I I I I I I I I I lit I I I I I I I- I 

Db 119 AT GGCAT T AT C T T C C T C GT GG GAT T T C C AG G CAAT G CAGT AGT GAT AT C CACT T AC ATT T 17 8 



Qy 180 T C T G CAT GAAGAACT G GAAC AGC AG CAAT GT CT AT CTT T T T AAC CT T T C CAT C T CT GACT 2 39 

II I I I I I M I I I I I I I I I II I I I I I I I I II I I I I 

Db 179 T C AAAAT GAG AC C TT G GAAGAGC AGC AC CAT CATT AT G CT GAAC CT GG C CT GC ACAGAT C 2 38 



Qy 



24 0 TTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT 



GATAAGGGGA 2 96 



I I 111 1 1 1 1 1 1 1 1 1 1 1 1 1 II I I II 1 1 1 1 I 1 1 1 1 III 

Db 239 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGAAAACTGGA 2 98 

Qy 2 97 C C TAT GGAGAT GTT C T CT GT AT AAG CAAC CGAT AT GT G CT T CAC AC CAAC CT C T ACAC C A 356 

I I I I I I I II I I I I I I I I I I I III I I I I I I M l M 

Db 2 99 TCTTTGGAGATTTCATGTGTAAGTTTATCCGCTTCAGCTTCCATTTCAACCTGTATAGCA 35 8 

Qy 357 GCATCCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATGAAGTACCCTTTCC 416 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 

Db 35 9 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCCAATGA 418 

Qy 417 GAGAACAC T T T CT ACAAAAGAAG GAAT T T GC CAT T T TAAT CTCGCTGGCT GT CTGGGCCT 47 6 

I I I I I I I I I 1 I I I I I I I II I I i III I 

Db 419 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 47 8 

Qy 477 T AGT GAC CT T AGAAGT T CT AC C CAT GCT CAC T T T CAT CAAT T C T GT CC CAAAAGAAGAGG 536 

I I I I I I I I I I I I I I I I I I I I I I I I 

Db 47 9 T T T CAC T G GT AG CT GT CAT T C C GAT GAC C T T C T T GAT CAC AT CAAC CAAC AG GAC CAAC A 53 8 

Qy 537 GC AGTAACT GCAT C GACT AT G C AAGT T CT GGAAAC C C T GAACACAAT CT C AT T T AC AG C C 596 

I III I IE I I I I I I I I I I I I I I I I I I Mil I 

Db 53 9 GAT CAGC C T GT CTC GAC CT CAC C AGT T C G G AT GAACT CAAT ACT AT T AAGT G GT 592 

Qy 5 97 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 65 6 

I I I I I I I I I! I I ' III I II II I II 

Db 593 ACAACCTGATTTTGACTGCAACTACTTTCTGCCTCCCCTTGGTGATAGTGACACTTTGCT 652 

Qy 657 AGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACA 716 

II ill 11 I I I I I I I I III 

Db 653 AT AC CAC GAT TAT C C ACACT CT GAC C CAT GGACT GCAAACT GACAGCT G C C T T AAGCAGA 712 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 77 6 

I I I I I I 1 I I III I I III II II I I I I I I 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 

Qy 777 ATATCATGCGCAATTTGAGGATCGCCTCACGCCTG 811 

Mill II I I I I I I I I I I I I M I I 

Db 773 ATATCTTGAGGGTCATTCGGATCGAATCTCGCCTG 807 



RESULT 10 
US-10-321-807-27 

; Sequence 27, Application US/10321807 

; Publication No. US2 00301661 4 8A1 

; GENERAL INFORMATION: 

; APPLICANT: Chen, Rupong 

; APPLICANT: Dang, Huong T. 

; APPLICANT: Lowitz, Kevin P. 

; TITLE OF INVENTION: No. US2 0030 1 6614 8Al-Endogenous , Cons titutively Activated 

Human G Protein-Coupled 

; TITLE OF INVENTION : . Receptors 

; FILE REFERENCE: AREN008 6 

; CURRENT APPLICATION NUMBER: US/ 10/ 32 1 , 8 07 
; CURRENT FILING DATE: 2002-12-16 

PRIOR APPLICATION, NUMBER: US/09/714 , 008 
; PRIOR FILING DATE: 2000-11-16 

PRIOR APPLICATION NUMBER: 09/170,496 



; PRIOR FILING DATE: 1999-11-17 

PRIOR APPLICATION NUMBER: PCT/US99/2393 8 

PRIOR FILING DATE: 2000-04-20 

PRIOR APPLICATION NUMBER: 60/166,088 
; PRIOR FILING DATE: 1999-11-17 

PRIOR APPLICATION NUMBER: 60/166,099 
; PRIOR FILING DATE: 1999-11-17 

PRIOR APPLICATION NUMBER: 60/166,369 

PRIOR FILING DATE: 1999-11-17 

PRIOR APPLICATION NUMBER: 60/171,902 
; PRIOR FILING DATE: 1999-12-23 

PRIOR APPLICATION NUMBER: 60/171,901 

PRIOR FILING DATE: 1999-12-23 

PRIOR APPLICATION NUMBER: 60/171,900 
; PRIOR FILING DATE: 1999-12-23 

PRIOR APPLICATION NUMBER: 60/181,749 

PRIOR FILING DATE: 2000-02-11 
; Remaining Prior Application data removed - See File Wrapper or PALM. 
; NUMBER OF SEQ ID NOS : 133 

SOFTWARE: Patentln version 3.0 
; SEQ ID NO 27 

LENGTH: 1014 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-10-321-807-27 

Query Match 8.2%; Score 126.6; DB 13; Length 1014; 

Best Local Similarity 49.9%; Pred. No. 3e-21; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

11.11 | | | | I I II I I I I I I I I I II I I I I I Ml 

Db 59 CT T TT GGAAAT T GCACT GAT GAAAACAT C C C ACT CAAGAT GCACT AC C T C CC T GTT AT T T 118 

Qy 12 0 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 17 9 

III I I I I I I I M I I I I I I I I I II I I I I I I I I I 

Db 119 AT GGCAT TAT CT T C CT C GT GGGATT T C CAGGCAAT G C AGT AGT GAT AT C C ACT T AC AT T T 17 8 

Qy- 180 TCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACT 239 

II I I I I II I I I I I I I I I I II II I I I I I I II I I I I 

Db 17 9 T CAAAAT GAGAC CT T GGAAGAG C AGC AC CAT C ATT AT GCT GAAC CT GGC CT GC ACAGAT C 238 

Qy 24 0 TTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT GATAAGGGGA 296 

I I Ml I I II I I I 1 I I I I I I I I I I I I I I I t I I I I III 

Db 239 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGAAAACTGGA 298 

Qy 2 97 C C TAT G GAGAT GTT CT CT GT AT AAGC AAC C GAT AT GT G CT T CACAC CAAC CT CT AC AC C A 356 

! I I I I I I I I I I I I I I I I I I I III II I I I I I I I I I 

Db 299 T CT TT G GAGAT T T CAT GT GT AAGT T TAT C CG CT T C AGCT T C CAT T T CAAC CT GT AT AG C A 358 

Qy 357 G CAT CCTCTTCCT C ACTT T CAT T AGCAT G GAC C GAT AT CT GC T CAT GAAGT AC C CT T T C C 416 

I M I I I I I 11 I M I I I I I I I I M Mill Mil I II I I 

Db 359 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCCAATGA 418 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 47 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 419 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 47 8 



Qy 4 77 T AGT GAC CT T AGAAGT T CT AC C CAT GCT CAC T T T CAT C AAT T C T GT C C CAAAAGAAGAGG 536 

I III II I I! Ill I II MM II I I I 

Db 4 79 T T T CAC T G GT AGC T GT CAT T C C GAT GAC C T T C T T GAT CAC AT C AAC C AAC AGGAC C AAC A 538 



Qy 537 G CAGTAAC T GC AT CGACT AT GC AAGT T CT GGAAAC C C T GAACAC AAT CT CAT T T ACAGC C 596 

I III I I I I I I I II I I I I I I I I I I II I II I I 

Db 539 GATCAGCCTGTCTCGACCTCACCAGTTCGG ATGAACTCAATACTATTAAGTGGT 592 

Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

I I I I I I I II I I I I I I i I I I I I II 

Db 593 ACAACCTGATTTTGACTGCAACTACTTTCTGCCTCCCCTTGGTGATAGTGACACTTTGCT 652 



Qy 657 AGATGGTAGTCTTCTT7VAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACA 716 

II I I I I II I I I I I I IE ill 

Db 653 AT AC CAC GATT AT CC AC AC T C T GAC C CAT GGACT GCAAACT GAC AG CT GC C T T AAGC AGA 712 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 776 

II I I I I I I I III II III II II I I I I I I 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 7 72 



Qy 7 77 AT AT CAT G C GC AAT T T GAG GAT C GC CT C AC GC CT G 811 

I IE I I I I I I I II I I I II I I I I I I 

Db . 773 ATATCTTGAGGGTCATTCGGATCGAATCTCGCCTG 807 



RESULT 11 
US-10-270-144-1 

; Sequence 1, Application US/10270144 

; Publication No. US2003004 97 90A1 

; GENERAL INFORMATION: 

; APPLICANT: WEI , Ming-Hui et al 

TITLE OF INVENTION: ISOLATED HUMAN G-PROTEIN COUPLED 
; TITLE OF INVENTION: RECEPTORS, NUCLEIC ACID MOLECULES ENCODING HUMAN GPCR 
; TITLE OF INVENTION: PROTEINS, AND USES THEREOF 

FILE REFERENCE: CL000750CON 

CURRENT APPLICATION NUMBER: US/ 1 0/ 2 7 0 , 14 4 
; CURRENT FILING DATE: 2002-10-15 

PRIOR APPLICATION NUMBER: 60/205,196 
; PRIOR FILING DATE: 2000-05-18 
; NUMBER OF SEQ ID NOS : 7 

SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 1 
; LENGTH: 1014 

TYPE: DNA 
; ORGANISM: Human 
US-10-270-144-1 



Query Match 8.2%; Score 126.6; DB 15; Length 1014; 

Best Local Similarity 49.9%; Pred. No. 3e-21; 

Matches 377; " Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

I I I I I I I I 1 I I II I I I I I t I I I I I I I I I I I IE 

Db 59 CTTTTGGAAATTGCACTGATGAAAACATCCCACTCAAGATGCACTACCTCCCTGTTATTT 118 



Qy 



120 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 17 9 



Ill II I I I I I I I I I I MINI I I I I I I I I I I I 

Db 119 AT G G CAT TAT CT T C CT C GT GGGAT T T C CAGGC AAT GCAGT AGT GAT AT C CACT T AC AT T T 17 8 

Qy 180 T C T GC AT GAAGAACT GGAAC AGC AGCAAT GT C TAT CT T T T TAAC C T T T C CAT CT C T GAC T 239 

II MM I I I I I I I I I I I I II II I II I I I II I I I I 

Db 17 9 T CAAAAT GAG AC CT T GGAAGAGC AGCAC CAT CAT TAT GCT GAAC CT GGC CT GCAC AGAT C 238 

Qy 240 TTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT GATAAGGGGA 296 

I E III I I I I I I I I I I I I I I I I I I I I I I I I i 1 I I Ml 

Db 239 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGAAAACTGGA 298 

Qy 297 C C TAT GGAGAT GT T CT C T GT AT AAGCAAC C GAT AT GT GCT T CACAC C AAC CT CT AC AC C A 356 

I I I II I I I I I I I I I I I I I I I III I I ! I II I I I I I 

Db 2 99 T CT T T GGAGATT T C AT GT GT AAGT T TAT C C G CT T C AGC T T C CAT T T C AAC CT GT AT AGC A 358 

Qy 357 GCATCCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATGAAGTACCCTTTCC 416 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I III! MM I 

Db 359 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCCAATGA 418 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 47 6 

I I I I I I I I I I I I I I I I I M I I I I I I I 

Db 419 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 47 8 

Qy 477 T AGT GAC C T T AGAAGTT C T AC C CAT GCT CACT T T CAT CAAT T C T GT CC CAAAAGAAGAG G 536 

I I I I II I I I I II I I I II M I I I I I 

Db 479 TT T CACT GGT AG CT GT CAT T C C GAT GAC C T T CT T GAT C AC AT C AAC C AACAGGAC CAAC A 538 

Qy 537 G C AGT AACT G CAT C GACT AT G CAAGT T CT GGAAAC C CT GAACACAAT CT C ATT T AC AG C C 596 

I III Mill I . I II II I I I I I I I I M II I I I 

Db 539 GATCAGCCTGTCTCGACCTCACCAGTTCGG AT GAACT CAAT ACT ATT AAGT GGT 592 

Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

I M II I II II I ! I I I I I I M I II 

Db 593 ACAACCTGATTTTGACTGCAACTACTTTCTGCCTCCCCTTGGTGATAGTGACACTTTGCT 652 

Qy 657 AGAT GGT AGT CTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACA 716 

II I I I I II I I I I I I II III 

Db 653 AT AC CAC GAT TAT C CAC ACT C T GAC C CAT G GACT G C AAACT GAC AGCT GC CTT AAG CAGA 712 

Qy 717 AACCCCAACGCCT GGT GGT CCT GGC GGTTGT GAT CTT CTCTATACT CTT CACAC CCT AT C 776 

I I I I I I M I II I II III M II II I M I 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 

Qy 77 7 AT AT CAT GC G CAAT T T GAG GAT C GC CT CAC G CCT G 811 

I II II II I I IMIM I I I I I I II 

Db 77 3 ATATCTTGAGGGTCATTCGGATCGAATCTCGCCTG 807 



RESULT 12 
US-10-188-405-7 

; Sequence 7, Application US/10188405 

; Publication No. US20030082585A1 

; GENERAL I N FORMAT I ON : 

; APPLICANT: Tian, Hui 

; APPLICANT: Dai, Kang 

; APPLICANT: Chen, Jin-Long 

; APPLICANT: Zhao, Jiagang 



APPLICANT : Cutler, Gene 
APPLICANT: Tularik Inc. 

TITLE OF INVENTION: No. US2 0030082 5 85Alel Receptors 
FILE REFERENCE: 01878 1-0084 10US 
CURRENT APPLICATION NUMBER: US/ 10/ 1 8 8 , 4 05 
CURRENT FILING DATE: 2002-07-01 
PRIOR APPLICATION NUMBER: US 60/302,800 
PRIOR FILING DATE: 2001-07-03 
NUMBER OF SEQ ID NOS : 25 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 7 
LENGTH: 1014 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human TGR164 
US-10-188-405-7 

Query Match 8.2%; Score 126.6; DB 15; Length 1014; 

Best Local Similarity 49.9%; Pred. No. 3e-21; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

I I I I I Mill I II I I I I I t I I I I I I I I II ill 

Db 59 CT T TT GGAAATT GCAC T GAT GAAAACAT C C CACT C AAGAT G C AC T AC C TC C C T GT TAT T T 118 

Qy 12 0 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGT CACTGTGGTGTTCGGCTACCTCT 179 

Ml II I I I I I I II I I I I I I I I I I I I I I I I I I I 

Db 119 AT G GCAT T AT CT T C CT C GT G GGAT T T C CAG GCAAT GC AGT AGT GAT AT C CACT T AC AT T T 17 8 

Qy 180 TCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTT7\ACCTTTCCATCTCTGACT 239 

II I I I I I I I I I I I I I I I I 1 I I ! I I I I I I II I I I I 

Db 17 9 T CAAAAT GAGACCTTGGAAGAGCAGCACCAT CAT T AT GCT GAAC CT GGCCT GCACAGAT C 23 8 

Qy 240 TTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT GATAAGGGGA 296 

I I III I I I 11 I I I I I I I I I I I . I I I II I I I I I I I III 

Db 2 39 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGAAAACTGGA 2 98 

Qy , 297 CCTATGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCA 356 

II I I I I I I I I I I I I I I I I I I III I I I I I I II I I I 

Db 2 99 TCTTTGGAGATTTCATGTGTAAGTTTATCCGCTTCAGCTTCCATTTCAACCTGTATAGCA 358 

Qy 357 G CAT CCTCTTCCT C ACT T T CAT T AGCAT GGAC C GAT AT CT GCT CAT GAAGT AC C CT TT C C 416 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 359 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCCAATGA 418 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 47 6 

I I I II II I IE I I I I I I I I I I I I I I I I 

Db 419 GCT GCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGT GCT GTGGTGT GGAT CA 47 8 

Qy 4 77 T AGT GAC C T T AGAAGT T CT ACC C AT GCT CACT TT CAT CAAT T CT GT C C CAAAAGAAGAG G 536 

I I I I I I I I I I I I I I I I I I I II I I I 

Db 479 T T T CACT GGT AG C T GT CAT T CC GAT GAC CT T CT T GAT CAC AT C AAC C AACAG GACCAAC A 538 

Qy 537 G C AGT AACT G CAT C GAC TAT G CAAGT T C T G GAAAC C C T GAACACAAT C T CAT T T AC AG C C 596 

I III I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 539 GATCAGCCTGTCTCGACCTCACCAGTTCGG ATGAACTCAATACTATTAAGTGGT 592 



Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

I I I I I I I I I I I I III I II II I II 

Db 593 ACAACCTGATTTTGACTGCAACTACTTTCTGCCTCCCCTTGGTGATAGTGACACTTTGCT 652 

Qy 657 AGAT GGT AGT CT T CT T AAAG AGGAGGAGC CAG C AGC AAG CAAC TGCCCTGC CACT G GAC A 716 

II I I I I II 1 I I I I I II III 

Db 653 AT AC C AC GAT TAT C C AC ACT CT GAC C C AT GGAC T G C AAAC T GACAGC T G C CT T AAG C AGA 712 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 77 6 

I I I I I I I I I II I I I - II I II I! I II II I 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 



Qy 777 AT AT CAT GC G C AAT T T GAG GAT C GC CT C AC GC CT G 811 

I I I I I I I I I I I I I II I I I I I I I I 

Db 773 ATATCTTGAGGGTCATTCGGATCGAATCTCGCCTG 807 



RESULT 13 
US-10-079-384-13 

; Sequence 13, Application US/10079384 
; Publication No. US20030108986A1 
; GENERAL INFORMATION: 

APPLICANT: Communi, Didier 
; TITLE OF INVENTION: COMPOSITIONS AND METHODS COMPRISING G-PROTEIN COUPLED 
RECEPTORS 

FILE REFERENCE: 9409/2132 
; CURRENT APPLICATION NUMBER: US/10/079,384 
; CURRENT FILING DATE: 2002-02-20 

PRIOR APPLICATION NUMBER: US 09/885,453 

PRIOR FILING DATE: 2001-06-20 

NUMBER OF SEQ ID NOS : 50 

SOFTWARE: Patent In version 3.1 
; SEQ ID NO 13 

LENGTH: 1014 
TYPE: DNA 

ORGANISM: Homo sapiens 
; FEATURE : 
; NAME/ KEY : CDS 
; LOCATION: (1) . . (1014) 
; OTHER INFORMATION: 
US-10-079-384-13 

Query Match 8.2%; Score 126.6; DB 15; Length 1014; 

Best Local Similarity 49.9%; Pred. No. 3e-21; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I III 

Db 59 C T T T T G GAAATT G CACT GAT GAAAAC AT C C CACT CAAGAT GCACT AC C T C C CT GT TAT T T 118 

Qy 12 0 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 17 9 

I II II I I I I I MM I II I I I I I I I I I I I I I I I 

Db 119 ATGGCATTATCTTCCTCGTGGGATTTCCAGGCAATGCAGTAGTGATATCCACTTACATTT 17 8 



Qy 18 0 TCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACT 239 

II Mil I I I I I II I I I I I II II I Mill II I I I I 



Db 



179 T CAAAAT GAGAC CT T GGAAGAG CAGC AC CAT CAT TAT G CT G AAC CT GGC C T GC AC AGAT C 238 



Qy 240 TTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT GAT AAG G GGA 296 

I I III I I I I I I I I I I M I I I I I I I I I I I 1 I I I I III 

Db 239 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGAAAACTGGA 298 

Qy 2 97 CCTATGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCA 35 6 

I I I I I I I I I I Mill I I I I I III I M I I I I I I I I 

Db 2 99 TCTTTGGAGATTTCATGTGTAAGTTTATCCGCTTCAGCTTCCATTTCAACCTGTATAGCA 358 

Qy 357 GCATCCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATGAAGTACCCTTTCC 416 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 

Db 35 9 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCC7\ATGA 418 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 47 6 

I I I I I I I I I I I I I I I I I II I Mill i 

Db 419 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 47 8 

Qy 477 T AGT GACCT T AGAAGT T CT AC C CAT G CT CAC T T T CAT CAAT T C T GT C C C AAAAGAAGAG G 536 

I III II I II III I I I I I I I I I I I I 

Db 47 9 T T T CAC T GGT AG CT GT CAT T C C GAT GAC CT T C T T GAT CAC AT CAAC CAAC AGGAC C AAC A 538 

Qy 537 GC AGT AACT GC AT C GACT AT G CAAGT T CT G GAAAC C CT GAAC ACAAT CT CAT T T ACAG C C 596 

I I I I I I I I I I I I I I I I MINIMI I I I I I 

Db 539 GAT C AG CCT GT CT C GAC CT CAC CAGT T C GG ATGAACTCAATACTATTAAGTGGT 592 

Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

I II II I M I I I I M I I I I I I I II 

Db 593 ACAACCTGATTTTGACTGCAACTACTTTCTGCCTCCCCTTGGTGATAGTGACACTTTGCT 652 

Qy 657 AGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACA 716 

II III I II I I I II I I I III 

Db 653 AT AC CAC GAT TAT C C ACAC T CT GAC C CAT G GACT GCAAACT GAC AG CT G C CT T AAGCAGA 712 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 77 6 

I I I I I I II I I I I I I III II II II IE I I 

Db 713 AAGCACG7\AGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 

Qy 777 ATATCATGCGCAATTTGAGGATCGCCTCACGCCTG 811 

II I II I I I I II I I II M I II I II 

Db 773 ATATCTTGAGGGTCATTCGGATCGAATCTCGCCTG 807 



RESULT 14 

US-10-225-567A-646 

; Sequence 646, Application US/10225567A 

; Publication No. US20030113798A1 

; GENERAL INFORMATION: 

; APPLICANT: Lifespan Biosciences 

APPLICANT: Brown, Joseph P. 
; APPLICANT: Burmer, Glenna C. 
; APPLICANT: Roush, Christine L. 

; TITLE OF INVENTION: ANTIGENIC PEPTIDES AND ANTIBODIES FOR G PROTEIN- COUPLED 
RECEPTORS (GPCRS) 

FILE REFERENCE: 1920-4-4 
; CURRENT APPLICATION NUMBER: US/ 10/225 , 567A 
; CURRENT FILING DATE: 2001-12-19 



PRIOR APPLICATION NUMBER: 60/257,144 
; PRIOR FILING DATE: 2000-12-19 
; NUMBER OF SEQ ID NOS : 2292 

SOFTWARE: Patentln version 3.1 
; SEQ ID NO 64 6 

LENGTH: 1014 
; TYPE : DNA 

; ORGANISM: Homo sapiens 
US-10-22 5-567A-64 6 



Query Match 8.2%; Score 126.6; DB 15; Length 1014; 

Best Local Similarity 49.9%; Pred. No. 3e-21; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

Mill I I I I I I I I I I I I I 11 I I I I I I I II III 

Db 59 CT T T T GGAAAT T GCAC T GAT GAAAACAT C CCAC T C AAGAT GCAC T AC CT C C CT GT TAT T T 118 

Qy 120 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 179 

III II I I I I I I I I I I I I I I I I I II I I I III I I 

Db 119 AT G G CAT T AT CT T CCT C GT GGG AT T T C C AGGC AAT G C AGT AGT GAT AT C C ACTT AC AT T T 17 8 



Qy 180 T CT GC AT GAAGAACT G GAACAGCAGC AAT GT C TAT CT T T T T AAC C T T T C CAT CT CT GACT 239 

II I II I II I I I I I I I I 1 I II II I I I I I I II MM 

Db 17 9 T CAAAAT GAGAC C TT G GAAGAGCAGC AC CAT CAT TAT G CT GAAC C T G GC C T GCACAGAT C 238 



Qy 24 0 TTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT GATAAGGGGA 2 96 

I I III I I I I I I I I I I I 11 I I I I I I I I I I I E I I I II I 

Db 2 39 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGAAAACTGGA 2 98 



Qy 297 C CT AT G GAGAT GT T CT CT GT AT AAGCAAC C GAT AT GT G C T T CAC AC C AAC C T CT AC AC C A 35 6 

I I I II I I I I I I I I I I I III I 111 M I I I I I I I I I 

Db 2 99 TCTTTGGAGATTTCATGTGTAAGTTTATCCGCTTCAGCTTCCATTTCAACCTGTATAGCA 358 



Qy 357 GCATCCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATGAAGTACCCTTTCC 416 

I I I I I M I I I I I I I I I I I I I I I I Mill till MM 

Db 359 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCCAATGA 418 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 47 6 

I I I II II I II II I M M II I Mill I 

Db 419 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 47 8 



Qy 477 T AGT GAC CT T AGAAGT T CT AC C C AT GCT CACT T T CAT CAAT T CT GT C C C AAAAGAAGAGG 536 

I I M I I I I I I II I I I II I I I I I I I 

Db 4 79 T T T CACT GGT AGC T GT CAT T C CGAT GAC CT T CT T GAT CAC AT C AAC CAAC AG GAC C AAC A 538 

Qy 537 G CAGT AACT GC AT CGACT AT G CAAGT T C T GGAAAC CCT GAAC AC AAT C T CAT T T AC AGC C 596 

I III I I II I I M II I I M I M I I I I I I I I I 

Db 539 GATCAGCCTGTCTCGACCTCACCAGTTCGG ATGAACTCAATACTATTAAGTGGT 592 



Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

I I II I I I I II I I III I II II I II 

Db 593 ACAACCTGATTTTGACTGCAACTACTTTCTGCCTCCCCTTGGTGATAGTGACACTTTGCT 652 

Qy 657 AGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACA 716 

II III I II I I II I I I I 111 

Db 653 AT AC CAC GAT TAT C CAC AC T CT G AC C CAT G GAC T G C AAAC T G AC AG CT G C C T T AAGC AGA 712 



Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 77 6 

M I I I I I I I I I I 1 I III II II I I I I I I 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 



Qy 777 AT AT CAT GC GC AAT T T GAGGAT C G C C T CAC G CCT G 811 

Mil! II I I I I I I I I I I I I I I II 

Db 773 AT AT CTT GAGGGT CATT C GGAT C GAAT CT C GC CT G 8 07 



RESULT 15 
US-10-010-568-1 

; Sequence 1, Application US/10010568 
; Publication No. US2 00301 57 598A1 
; GENERAL INFORMATION: 

; APPLICANT: Bristol-Myers Squibb Company 

; TITLE OF INVENTION: A NOVEL HUMAN G- PROTEIN COUPLED RECEPTOR, HGPRBMY23, 

EXPRESSED HIGHLY IN 

; TITLE OF INVENTION: KIDNEY 

; FILE REFERENCE: D0077 NP 

; CURRENT APPLICATION NUMBER: US/ 10/010, 568 
; CURRENT FILING DATE: 2001-12-07 
; PRIOR APPLICATION NUMBER: US 60/251,926 
PRIOR FILING DATE: 2000-12-07 
PRIOR APPLICATION NUMBER: US 60/269,7 95 
PRIOR FILING DATE: 2001-02-14 
; NUMBER OF SEQ ID NOS : 55 

SOFTWARE: Patentln version 3.0 
; SEQ ID NO 1 

LENGTH: 1081 
TYPE: DNA 
; ORGANISM: homo sapiens 
FEATURE: 
NAME/ KEY: CDS 
LOCATION: (54).. (1064) 
US-10-010-568-1 



Query Match 8.2%; Score 126.6; DB 13; Length 1081; 

Best Local Similarity 49.9%; Pred. No. 3.1e-21; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

1 I I I I I I I I I I II I I I I I I I I I I I I I I I I III 

Db 112 C T TT T GGAAAT T GC AC T GAT GAAAAC AT C C CACT C AAGAT GC ACT AC CT C C C T GT T AT TT 171 



Qy 120 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 179 

Ml I I I I I I I I I I I I I I I I I I I I I I I I III I I 

Db 172 ATGGCATTATCTTCCTCGTGGGATTTCCAGGC7VATGCAGTAGTGATATCCACTTACATTT 231 

Qy 18 0 TCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACT 239 

M I I I I I I M I I I I I I I I II II I I I I I I II I 1 I I 

Db 232 T CAAAAT GAGAC CT T G GAAGAGC AG CAC CAT CAT TAT GCT GAAC C T GGC CT GCAC AGAT C 2 91 



240 
292 



TTGCTTTCCTGT GCAC C CTT CCCAT CCT GAT AAAGAGTTATGCCAAT GATAAGGGGA 2 9 6 

I I Ml I I I I I I M I II I I I I I I I II I II I MM III 

TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGAAAACTGGA 351 



Qy 2 97 C CT AT G GAGAT GT T CT C T GT AT AAGCAAC CGAT AT GT G CT T CAC AC CAAC C T CT AC AC CA 356 

I I I I I I I I I I II II IN M I I I I 

Db 352 TCTTTGGAGATTTCATGTGTAAGTTTATCCGCTTCAGCTTCCATTTCAACCTGTATAGCA 411 

Qy 357 GCATCCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATGAAGTACCCTTTCC 416 

I I I I I I I I I I I I I M I I I I MM I 

Db 412 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCCAATGA 471 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 47 6 

I I I II M I M I M 1 I I I M I I I I I I I 

Db 472 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 531 

Qy 477 T AGT GAC CT T AGAAGTT C T ACC C AT GCT C ACT T T CAT CAAT T C T GT C C CAAAAGAAGAG G 536 

I 111 II I I I I II I I I II II II I I I 

Db 532 T T T CAC T G GT AG CT GT C ATT CC GAT GAC C TT CT T GAT CAC AT CAAC C AAC AG GAC C AACA 591 

Qy 537 GCAGT AACT GC AT C GACT AT GCAAGTT C T G GAAAC C CT GAAC ACAAT CT C AT TT AC AGC C 596 

I III I I I II I I M II I II I I I I I M I M I I 

Db 592 GATCAGCCTGTCTCGACCTCACCAGTTCGG AT GAAC T CAAT AC TAT T AAGT GGT 645 

Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

I I M I I M I I I I I I I I M M I II 

Db 64 6 ACAACCTGATTTT GACT GCAACTACTTTCTGCCTCCCCTT GGT GAT AGT GACACTTT GCT 7 05 

Qy 657 AG AT GGT AGT CT T CT T AAAGAG GAGGAG C CAGCAGC AAG CAAC TGCCCTGC CAC T G GAC A 716 

II III I II I II I I I I I I I I 

Db 7 06 AT AC CAC GAT TAT C C ACAC T CT G AC CC AT GGACT GC AAAC T GACAG C T GC CT TAAGC AGA 7 65 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 776 

I I I I I I II I I I I I I Ml II II I I I I I I 

Db 7 66 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 825 

Qy 777 ATATCATGCGCAATTTGAGGATCGCCTCACGCCTG 811 

I I II I M I I I I II II I I I I I I I I 

Db 826 AT AT CT T GAGGGT CATTC GGAT C GAAT CT CGCCT G 8 60 



Search completed: December 14, 2003, 17:43:01 
Job time : 519 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2003 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence : 



December 14, 2003, 11:13:44 ; Search time 5878 Seconds 

(without alignments) 
10738.951 Million cell updates/sec 

US-09-891-138A-1 
1543 

1 gctcctggcagagttttctg tgcctaaataaatcaatata 1543 



Scoring table: IDENTITY_NUC 

Gapop 10.0 r Gapext 1.0 



Searched: 



2888711 seqs, 20454813386 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



5777422 



Database 



GenEmbl : 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 



gb_ba : * 
gb_htg: * 
gb_in: * 
gb_om: * 
gb_ov: * 
gb_pat : * 
gb_jph : * 
gb_pl : * 
gb_j?r : * 
gb_ro : * 
gb__sts : * 
gb_sy : * 
gb_un : * 
gb_vi : * 
em_ba : * 
em__f un : * 
em_hum : * 
em__in : * 
emjmi : * 
em_om : * 
em_or : * 
em__ov: * 
em_pat : * 
em_ph : * 
em_p 1 : * 
em_ro : * 
em sts:* 



28 


em 


un : * 


29 


em 


vi : * 


30 


em 


htg hum:* 


31 


em 


htg inv:* 


32 


em 


htg other:* 


33 


em 


htg mus : * 


34 


em 


htg pin:* 


35 


em 


htg rod:* 


36 


em 


htg mam:* 


37 


em 


htg vrt:* 


38 


em 


sy : * 


39 


em 


htgo hum: * 


40 


em 


htgo mus : * 


41 


em 


htgo other: 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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.1 
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9 
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ALIGNMENTS 



RESULT 1 
AX376573 

LOCUS AX376573 1543 bp DNA linear PAT 01-MAR-2002 

DEFINITION Sequence 1 from Patent WO0200719. 
ACCESSION AX376573 

VERSION AX376573.1 GI: 19170674 

KEYWORDS 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 

AUTHORS Lin,D.C, Zhao, J., Chen, J. L. and Cutler, G. 
TITLE Novel receptors 

JOURNAL Patent: WO 0200719-A 1 03-JAN-2002; 
Tularik Inc. (US) 
FEATURES Location/Qualifiers 
source 1. .1543 

/organism="Mus musculus" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 10090" 
CDS 44. .997 

/note="mouse TGR18 G-protein coupled receptor (GPCR) " 
/ codon start=l 
/protein_id="CAD26816. 1" 
/db_xref="GI : 19170675" 

/ 1 r ans la t ion= "MAQNLSCENWLATEAI LNKYYLSAFYAI EFI FGLLGNVTWFGY 
LFCMKNWNSSNVYLFNLSISDFAFLCTLPILIKSYANDKGTYGDVLCISNRYVLHTNL 
YTSILFLTFISMDRYLLMKYPFREHFLQKKEFAILISLAWALVTLEVLPMLTFINSV 
PKEEGSNCIDYASSGNPEHNLIYSLCLTLLGFLIPLSVMCFFYYKMWFLKRRSQQQA 
TALPLDKPQRLWLAWIFSILFTPYHIMRNLRIASRLDSWPQGCTQKAIKSIYTLTR 
PLAFLNSAINP I FYFLMGDHYREMLI SKFRQYFKSLTS FRT " 

BASE COUNT 438 a 352 c 293 g 460 t 

ORIGIN - 

Query Match 100.0%; Score 1543; DB 6; Length 1543; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1543; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 



1 GCT CCT GGCAGAGTTTT CT GT CGAGACAGAAGCCGACAGCAGAAT GGCACAGAATTT AT C 60 







1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 


GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAA.1 1 


fin 

D U 


OV 


61 


TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 


120 




1 1 M 1 1 1 II 1 II 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 




Db 


61 


T T GT GAGAAT T GGT T GG CAACAGAGGCT AT CT T GAATAAGT ACT AC CT CT C T GCATT TTA 


ion 


Qv 


121 


TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 


180 




1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 II 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


121 


TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 


-ion 


Ov 


181 


CT GC AT GAAGAACT GGAAC AGCAGCAAT GT CT AT CT T T T T AAC CTTT C CAT CT CT GACTT 


240 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 




Db 


181 


CT GC AT GAAGAACT G GAAC AGCAGCAAT GT CT AT CTTTT TAAC CT T T C CAT CT CT GACTT 


A 4 U 


Ov 

S£ jf 


241 


TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 


300 




I I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 M 1 II 1 




Db 


241 


TGCTTTCCTGTG C AC C CT T CCCATC CT GAT AAAGAGT TAT G C CAAT GAT AAGGGGACCT A 


o r\ r\ 
JUL) 


Ov 


301 


T GGAGAT GT T CT CT GT AT AAGC AAC C GAT AT GT G CT T CACAC CAAC CT CT ACACCAGCAT 


360 




I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


301 


T GGAGAT GT T CT CT GT ATAAGCAAC C GAT AT GT GCT T CACAC CAAC CT CT ACACCAGCAT 


o a r\ 


Ov 


361 


CCTCTTCCT C ACT T T CAT T AGCAT GGAC C GAT AT CT GCT CAT GAAGT AC C CTTT CCGAGA 


420 




I | | | | | | | | | | 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 II 1 1 




Db 


361 


CCTCTTCCT C ACTTT CATT AGCAT GGAC C GAT AT CT GCT CAT GAAGT AC C C TTT CCGAGA 


42 0 


Ov 


421 


ACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 


480 




I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 




Db 


421 


ACACT T T CT ACAAAAGAAGGAAT T T G C CAT T T TAAT CTCGCTGGCT GT CT GGGCCTT AGT 


4 8 0 


Ov 


481 


GAC CT T AGAAGT T CTACCCAT GCT CACT TT CAT CAAT T CT GT CC CAAAAGAAGAG G G C AG 


540 




I I I I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 M 1 1 1 




Db 


481 


GAC CT T AGAAGT T CTACCCAT GCT C ACT T T CAT CAAT T CT GT C C CAAAAGAAGAGGGCAG 


c a r\ 
o4 U 


Ov 


541 


T AACT GC AT C GACTAT GCAAGT T CT GGAAAC C CT GAAC ACAAT CT C AT TTACAGC CT C T G 


600 




I I 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


541 


TAAC T GC AT C GACTATGCAAGTTCT G GAAAC C CT GAAC ACAAT CT CATTT ACAGC CT CT G 


bU U 


Ov 


601 


CCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGAT 


660 




I | | | 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 M 1 1 1 1 1 1 




Db 


601 


CCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGAT 


(Z f\ 


Qv 


661 


GGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACC 


720 




I 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


661 


GGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACC 


ion 


Qy 


721 


CCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATAT 


780 






I I M 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


721 


CCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATAT 


ion 

/ o U 


Qy 


781 


CAT GC GCAATTTGAGGAT CGCCT CACGCCT GGATAGTT GGCCACAAGGAT GTACACAGAA 


840 




I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


781 


CAT GCGCAATTTGAGGAT CGCCT CACGCCT GGATAGTT GGCCACAAGGAT GTACACAGAA 


840 


Qy 


841 


GGC C AT CAAAT C T AT AT AC AC ACT GAC AC GGC CT CTGGCCTTTCT GAACAGT GC CAT CAA 


900 




1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 M 1 II 1 1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 





Db 



841 



G G C CAT CAAAT CT AT AT AC ACACT GAC AC GGCCT C TGGCCTTTCT GAAC AGT GCC AT CAA 



900 



Qy 901 T C C CAT CT T CTACT T C CT C AT GGGAGAC CAT TAC AGAGAGAT GC T GAT T AGTAAGT T C AG 960 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
Db 901 T C C CAT CT T CT ACT T C CT CAT GGGAGAC CAT TAC AGAGAGAT GC T GATT AGT AAGTT CAG 9 60 

Qy 961 ACAAT ACT T CAAGT C CCTT AC AT C CT T C AGGAC AT GAG CT G CT G GAT GCAGGT CT T CACT 102 0 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 961 ACAAT ACT T CAAGT C CCTT ACAT C CT T CAGGAC AT GAG CT G CT G GAT GCAGGT CT T CACT 102 0 

Qy 1021 C AG CCAAAAT GAGAC ACTT GAT AAAC AGT GCT GT G C AGTT GAGT T TT AACT AAGT AAAC C 108 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I II I 
Db 1021 CAGCCAAAATGAGACACTTGATAAACAGTGCTGTGCAGTTGAGTTTTAACTAAGTAAACC 108 0 

Qy 1081 ACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTG 114 0 

I I I M I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I 
Db 1081 ACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCTyVCCCCCAGGGCTGGAGTACAAGCTG 114 0 

Qy 1141 GGT CC AC AT GAAT CAGAAGG CAGCT CT CT GT T CT GAT T TT AGGT TAT AC C C AGAGT AT GG 12 00 

I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I 
Db 1141 GGT CC AC AT GAAT CAGAAGG CAGCT CT CT GT TCT GAT T TT AGGT TAT AC C C AGAGT AT GG 12 00 

Qy 1201 AAAAAAT AAGGCAT GAGAAAGCAT T GAC AT CTT CACT T AAGAACT GAACAAAAGAGAACA 12 60 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

Db 12 01 AAAAAAT AAGGCAT GAGAAAGCAT T GAC AT CTT CAC T T AAGAACT GAACAAAAGAGAACA 12 60 

Qy 1261 AAT ATT GT CAAT GT T T GGACACT T AGG AT CT GAAAT CTT GGAAATT TT AAGAC CT CT TT T 1320 

I I I I I I I I I I I II I I I II I I I I I I I I I M II I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 1261 AAT AT T GT CAAT GT T T GGACACT TAGGAT C T GAAAT CTT GGAAAT TTT AAGAC CT CT TT T 1320 

Qy 1321 T CTAT CAGT GTAAAAGGAAT ACAAGAT AG CTAGT T GCAAAT GCT GAAT GC AT T T CAT CAT 1380 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1321 T CTAT CAGT GTAAAAGGAAT ACAAGAT AG CTAGT T GCAAAT GCT GAAT GC AT T T CAT CAT 138 0 

Qy 1381 TGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTGTAATATTAAAA 144 0 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1381 TGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTGTAATATTAAAA 14 4 0 

Qy 1441 TTTATGT GAAAAAT GAATATAATT CAAT GTACAACATTAGATTTTCTATTT GAAAATTAT 1500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I M II I I I I I I 
Db 1441 T T TAT GT GAAAAAT GAAT AT AAT T CAAT GT ACAAC AT TAG AT TTT CTAT T T GAAAAT TAT 1500 

Qy 1501 ATTT CTT GAAAAAAT AACT GCT GT GCCTAAATAAAT CAAT AT A 1543 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I 

Db 1501 ATTT CTT GAAAAAAT AACTGCT GT GCCTAAATAAAT CAAT AT A 1543 



RESULT 2 
AF295367 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



AF295367 1598 bp mRNA linear ROD 06-APR-2001 

Mus musculus G-protein coupled receptor GPR91 mRNA, complete cds . 
AF295367 

AF2 95367. 1 GI : 12711490 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 



CDS 



BASE COUNT 
ORIGIN 



Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus . 

1 (bases 1 to 1598) 

Wittenberger,T. , Schaller , H . C . and Hellebrand, S . 

An expressed sequence tag (EST) data mining strategy succeeding in 

the discovery of new G-protein coupled receptors 

J. Mol. Biol. 307 (3), 799-813 (2001) 

21172992 

11273702 

2 (bases 1 to 1598) 

Wittenberger, T. , Schaller , C . H . and Hellebrand, S . 
Direct Submission 

Submitted ( 14-AUG-2000 ) ZMNH, Institut fur 

Entwicklungsneurobiologie, Martinistr. 52, Hamburg 20246, Germany 
Location/Qualifiers 
1. .1598 

/organism="Mus mus cuius" 
/mol_type="mRNA" 
/strain="C57BL" 
/db_xref="taxon: 10090" 
74. .1027 

/note="orphan receptor" 
/ codon_start=l 

/product="G-protein coupled receptor GPR91" 
/protein_id="AAK01867 . 1" 
/db_xref ="GI : 12711491" 

/ trans la tion="MAQNLSCENWLATEAILNKYYLSAFYAIEFIFGLLGNVTVVFGY 
LFCMKNWNSSNVYLFNLSISDFAFLCTLPILIKSYANDKGTYGDVLCISNRYVLHTNL 
YT SMLLLTVI SMDRYLLMKYP FREH FLQKKEFAI LI S LAVWALVTLEVLPMLT FINS V 
PKEEGSNCIDYASSGNPEHNLIYSLCLTLLGFLIPLSVMCFFYYKMWFLKRRSQQQA 
TALPLDKPQRLWLAWI FSI LFTPYHIMRNLRIASRLDSWPQGCTQKAI KS I YTLTR 
PLAFLNSAINPI FYFLMGDHYREMLI SKFRQYFKSLTSFRT 11 
465 a 358 c 303 g 472 t 



Query Match 99.4%; 
Best Local Similarity 99.6%; 
Matches 1537; Conservative 



Score 1533.4; 
Pred. No. 0; 
0; Mismatches 



DB 10; Length 1598; 

6; Indels 0; Gaps 



0; 



Qy 


1 


Db 


31 


Qy 


61 


Db 


91 


Qy 


121 


Db 


151 


Qy 


181 


Db 


211 


Qy 


241 


Db 


271 



GCTCCT GGCAGAGTTTT CT GTCGAGACAGAAGCCGACAGCAGAAT GGCACAGAATTTAT C 60 

t I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I I M II I I I I I I I I I I I I I I I I 

GCTCCT GGCAGAGTTTT CT GT C GAGACAGAAGC C GACAGCAGAAT GGCACAGAATT T AT C 9 0 
T T GT GAGAATT GGT T GGCAAC AGAG GCT AT CT T GAATAAGT ACT AC CT CTCT GCAT TT TA 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 

T T GT GAGAAT T GGT T GGCAAC AGAGGCT AT CT T GAATAAGT AC T AC CT CTCT G CAT T TT A 150 

TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 18 0 

I | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I M I II I I I I I I I I 

TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTTGGCTACCTCTT 210 

CT GCAT GAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTC CAT CTCTGACTT 240 

I I I M I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I 

C T GCAT GAAGAACT G GAAC AGCAGCAATGT CT AT CT T T TT AAC CTT T C CAT CTCT GACT T 270 
T GCTTT C CT GT G C AC C CT T C C CAT C CT GAT AAAGAGT TAT GC CAAT GAT AAGGGGAC CTA 300 

M I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I 

T GCTTT C C T GT G CAC C CTT C C CAT C C T GAT AAAGAGT TAT GC CAAT GAT AAGGGGAC CTA 330 



Qy 301 T GGAGAT GT T CT CT GT AT AAGCAAC CGAT AT GT GCT T CACAC CAAC CT CT AC AC CAGCAT 360 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I E I I I I I I I 
Db 331 T GGAGAT GT T C T CT GT AT AAG CAAC C GAT ATGT GCT T C ACACCAAC CT CT ACAC CAGCAT 390 

Qy 361 CCTCTTCCT CACT T T CAT TAG CAT GGACCGAT AT C T GCT CAT GAAGTAC C CT T T C C GAGA 420 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 

Db 391 GCT CTT GCT CACT GTCATTAGCATGGACCGATATCT GCT CAT GAAGTACCCTTTCCGAGA 450 

Qy 421 ACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 4 80 

I I I I M I I I I M I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 451 ACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 510 

Qy 481 GAC CT T AGAAGT T CT AC C CAT GCT CACTTTCAT CAATT CT GT C C CAAAAGAAGAGG GC AG 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
Db 511 GAC CT T AGAAGTT CT AC C CAT GCT CACTTT CAT CAAT T CT GT C C CAAAAGAAGAGG GC AG 570 

Qy 541 T AACT GCAT C GAC TAT GCAAGT T CT GGAAACCCT GAAC ACAAT CT CAT T T AC AGC CT CT G 600 

I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I M I I I I I I i I I I I I I I I I I I 
Db 571 T AACT GCAT C GAC TAT G CAAGT T CT GGAAACCCT GAAC ACAAT CT CAT T T AC AG C CT CT G 630 

Qy 601 CCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGAT 660 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I II I I I I I I I I I I I I 

Db 631 CCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGAT 690 

Qy 661 GGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACC 720 

I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M I I I I I I M I I I I I I M 
Db 691 GGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACC 750 

Qy 721 CCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATAT 7 80 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 

Db 751 CCAACGCCTGGTGGTCCTGGCAGTTGTGATCTTCTCTATACTCTTCACACCCTATCATAT 810 

Qy 781 CAT GC GCAAT T T GAGGAT C G C CT CAC GC CT GGATAGTT GG CCACAAGGAT GT AC AC AGAA 84 0 

I I I I I I I I I I I I I I I I I I ! I I I I I I I I II I I M I I I I I I I M I I I I I I I I I I I I I I I I I I 
Db 811 CAT GC G CAAT T T GAGGAT C GC CT CAC GC CT GGATAGTT GGC CACAAGGAT GT AC AC AGAA 87 0 

Qy 841 GGC CAT CAAAT CT AT AT AC AC ACT GACAC GGC CT CT GGCCT TT CT GAAC AGT GC CAT CAA 900 

I | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 871 GGC CAT CAAAT CT AT AT AC AC ACT GACAC GGCCT CT GGCC TT T CT GAACAGT GC C AT CAA 930 

Qy 901 T CC C AT CTT CT ACT T C CT CAT GG GAGAC CAT T ACAGAGAGAT GCT GAT T AGTAAGT T CAG 960 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I M 
Db 931 TCCCAT CTT CTACTT CCT CAT GGGAGACC ATT ACAGAGAGAT GCT GATTAGTAAGTT CAG 9 90 

Qy 961 ACAAT ACT T CAAGT C C CT T AC AT C CT T CAGGAC AT GAGCT GCT GGAT GC AGGT CTT CAC T 1020 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I I I I 
Db 991 ACAAT ACTT CAAGT CC CTT AC AT CCT T CAGGAC AT GAGCT GCT GGAT GCAGGT CTT CACT 1050 

Qy 1021 CAGC CAAAAT GAGAC ACT T GATAAACAGT GCT GT G C AGTT GAGTT T T AACTAAGT AAAC C 108 0 

I | | M I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I 
Db 1051 CAGC CAAAAT GAGACACTT GATAAACAGT GCT GT GCAGTTGAGTTTTAACTAAGTAAACC 1110 

Qy 1081 ACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTG 114 0 

I I I I I I I I I I I I I M I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
Db 1111 ACCATTTCTACGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTG 117 0 



Qy 1141 G GT C C AC AT GAAT CAGAAGGCAGCT CTCTGTTCT GAT TT T AGGT TAT AC C C AGAGT AT GG 12 00 

I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M I M 
Db 1171 GGT C C AC AT GAAT CAGAAGGC AG CT CTCT GT T CT GAT TT T AGGT TAT AC C C AGAGT AT GG 12 30 

Qy 1201 AAAAAATAAGGC AT GAGAAAGCAT T GACAT CT T C ACT TAAGAACT GAACAAAAGAGAAC A 12 60 

1 I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I! I I I I I I I I I I I I 
Db 1231 AAAAAATAAGGCAT GAGAAAGCAT T GACAT CT T C ACT TAAGAACT GAAC AAAAGAGAACA 1290 

Qy 1261 AAT AT T GT CAAT GTT T GGAC ACT T AG GAT CT GAAAT CT T GGAAATTT T AAGAC CT CTTTT 132 0 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I M I I I I 
Db 1291 AAT ATT GT CAATGTTT GGACACTTAGGAT CT GAAAT CTT GGAAATTTTAAGACCT CTTTT 1350 

Qy 1321 T CT AT CAGT GTAAAAGGAAT ACAAGAT AGCT AGT T GCAAAT GCT GAAT GC AT TT CAT CAT 138 0 

I | | | I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 

Db 1351 T CT AT CAGT GTAAAAGGAAT ACAAGAT AGCT AGT T GCAAAT G CT GAAT GCAT T T CAT CAT 1410 

Qy 1381 T GGT CAGGTCGATAAGCGTGTTTCT GAAAT AGTCTTATTTTTATT CTT GT AAT ATT AAAA 144 0 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II I I I I 
Db 1411 T GGT C AGGT C GAT AAGC GT GTT T CT GAAAT AGT CT T AT TTTT ATT CTTGT AAT ATT AAAA 147 0 

Qy 14 41 T T T AT GT GAAAAAT GAAT AT AAT T CAAT GT AC AAC AT TAG AT T T T C T AT T T GAAAAT TAT 1500 

I I I M I I I I I I I I I I I I I I I I I I I M I I I I I I I II I I I I I I I I I I I I I I I I M I I I I M I 
Db 1471 TT T AT GT GAAAAAT GAAT AT AAT T CAAT GT AC AAC AT T AGAT T T T CT ATT T GAAAAT TAT 1530 

Qy 1501 ATTTCTTGAAAAAATAACTGCTGTGCCTAAATAAATCAATATA 154 3 

I I I I I I I I I I I I I I I II I I M I I I I I I I I I I II I I I I I I I I I I 
Db 1531 AT T T CT T GAAAAAAT AACT GCT GT GC CT AAAT AAAT CAAT AT A 1573 



RESULT 3 

AC138318/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



AC138318 184306 bp DNA linear HTG 22-MAR-2003 

Mus musculus chromosome 3 clone RP23-358I23 map 3, WORKING DRAFT 
SEQUENCE, 10 unordered pieces. 
AC138318 

AC138 318 . 3 GI : 2 9150501 
HTG; HTGS_PHASE1; HTGS_DRAFT . 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 184306) 
Birren,B., Nusbaum, C. and Lander, E. 

Mus musculus chromosome 3, clone RP23-358I23 
Unpublished 

2 (bases 1 to 184306) 

Birren,B., Nusbaum,C, Lander, E., Ali , A. , Allen, N., Anderson, S., 
Barna,N., Bastien,V., Bloom,T., Boguslavkiy, L . , Boukhgalter , B . , 
Camarata,J., Chang, J., Chazaro,B., Choepel,Y., Collymore, A. , 
Cook, A., Cooke, P., DeArellano, K. , Dewar,K., Diaz, J. S., Dodge, S., 
Faro,S., Ferreira,P., FitzGerald, M. , Gage,D., Galagan,J., 
Gardyna,S., Gord,S., Graham,L., Grand-Pierre, N. , Hafez, N., 
Hagos,B., Horton,L., Hulme,W., Iliev, I., Johnson, R. , Jones, C, 
Kamat,A. , Karatas,A., Kells,C, Landers, T., Levine,R., 
Lindblad-Toh, K. , Liu, G. , MacLean,C, Macdonald, P . , Major, J., 
Matthews, C, McCarthy, M. , Meldrim, J. , Meneus,L., Mihova,T., 
Mlenga,V., Murphy, T., Naylor,J., Nguyen, C, Nicol,R., Norbu,C, 



Norman, C.H., O'Connor,! 1 ., 0 1 Donnell, P . , 0'Neil,D., Oliver, J., 
Peterson, K., Phunkhang, P . , Pierre, N . , Raymond, C, Retta,R., 
Rise,C, Rogov,P., Roman, J. , Roy, A., Schauer,S., Schupback, R. , 
Seaman, S., Severy, P., Smith, C. , Spencer, B., Stange-Thomann, N . , 
Stojanovic,N. , Talamas,J., Tesfaye,S., Theodore, J., Topham, K. , 
Travers,M., Vassiliev, H . , Viel,R., Vo, A. , Wilson, B. , Wu,X., 
Wyman,D., Young, G. , Zainoun,J., Zembek,L., Zimmer,A. and Zody,M. 
TITLE Direct Submission 

JOURNAL Submitted (25-DEC-2002 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
REFERENCE 3 (bases 1 to 184306) 

AUTHORS Birren,B., Nusbaum, C, Lander, E. , Abouelleil, A. , Allen, N., 
Anderson, S. f Arachchi, H.M. , Barna,N., Bastien,V., Bloom,T., 
Boguslavkiy, L . , Boukhgalter , B . , Camarata, J. , Chang, J., Choepel,Y., 
Collymore, A. , Cook, A. , Cooke, P., Corum, B., DeArellano, K . , 
Diaz, J. S . , Dodge, S., Dooley,K., Dorris,L., Erickson,J., Faro,S., 
Ferreira,P., FitzGerald, M. , Gage,D., Galagan,J., Gardyna,S., 
Graham,L., Grand-Pierre, N . , Hafez, N., Hagopian,D., Hagos,B., 
Hall, J., Horton,L., Hulme,W., Iliev, I., Johnson, R., Jones, C, 
Kamat,A., Karatas,A., Kells,C, Landers, T., Levine,R., 
Lindblad-Toh, K. , Liu, G . , Lui, A. , Mabbitt,R., MacLean,C, 
Macdonald, P. , Major, J. , Manning, J., Matthews, C, McCarthy, M. , 
Meldrim, J., Meneus,L., Mihova,T., Mlenga,V., Murphy, T., Naylor,J., 
Nguyen, C, Nicol,R., Norbu,C, 0'Connor,T., O 1 Donnell , P . , 
0'Neil,D., Oliver, J., Peterson, K . , Phunkhang, P . , Pierre, N., 
Rachupka,A., Ramasamy , U . , Raymond, C, Retta,R., Rise,C, Rogov,P., 
Roman, J., Schauer,S., Schupback, R. , Seaman, S., Severy, P., Smith, C. 
Spencer, B . , Stange-Thomann, N . , Sto j anovic, N . , Stubbs,M., 
Talamas,J., Tesfaye,S., Theodore, J., Topham, K. , Travers,M., 
Vassiliev, H. , Venkataraman, V. S . , Viel,R., Vo,A., Wilson, B., Wu,X., 
Wyman,D., Young, G., Zainoun,J., Zembek,L., Zimraer,A. and Zody,M. 

TITLE Direct Submission 

JOURNAL Submitted (22-MAR-2003 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
COMMENT On Mar 22, 2003 this sequence version replaced gi:28191615. 

All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : / / ftp . genome . Washington . edu/RM/ RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact: sequence_submissions@genome . wi .mit.edu 

Project Information 

Center project name: L28921 
Center clone name: 358__I_23 

Summary Statistics 

Sequencing vector: Plasmid; n/a; 100% of reads 
Chemistry: Dye-terminator Big Dye; 100% of reads 
Assembly program: Phrap; version 0.960731 
Consensus quality: 181695 bases at least Q40 
Consensus quality: 182410 bases at least Q30 
Consensus quality: 182638 bases at least Q20 
Insert size: 170000; agarose-fp 
Insert size: 183406; sum-of-contigs 
Quality coverage: 12.8 in Q20 bases; agarose-fp 
Quality coverage: 11.9 in Q20 bases; sum-of-contigs 



NOTE: This is a 'working draft' sequence. It currently 
consists of 10 contigs . The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

contig of 58928 bp in length 
100 bp 

of 647 bp in length 
100 bp 
of 4970 bp 
100 bp 



1 

58929 
59029 
59676 
59776 
64746 
64846 
69475 
69575 
73673 
73773 
81218 
81318 
97239 
97339 
113369 
113469 
166138 
166238 



gap of 
contig 
gap of 
contig 
gap of 



contig of 4629 
gap of 100 bp 



contig 
gap of 
contig 
gap of 
contig 
gap of 
contig 
gap of 
contig 
gap of 
contig 



bp 
bp 
bp 



of 4098 
100 bp 
of 7445 
100 bp 
of 15921 bp 
100 bp 
of 16030 bp 
100 bp 
of 52669 bp 
100 bp 
of 18069 bp 



length 
length 
length 
length 
length 
length 
length 
length. 



FEATURES 

source 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



58928 
59028 
59675 
59775 
64745 
64845 
69474 
69574 
73672 
73772 
81217 
81317 
97238 
97338 
113368 
113468 
166137 
166237 
184306 
Location/ Qualifiers 
1. .184306 

/organism="Mus musculus" 
/mol__type~" genomic DNA" 
/db_xref="taxon: 10090" 
/ chromosome="3" 
/map="3" 

/clone="RP23-358l23" 

/clone_lib="RPCI-23 Female Mouse BAG" 
1. .58928 

/ note="assembly_f ragment 

clone__end : SP6 

vector_side : left" 

59029. .59675 

/note="assembly_f ragment" 

59776. .64745 

/ note="assembly_f ragment" 

64846. .69474 

/note="assembly fragment" 

69575. .73672 

/note= n assembly_f ragment" 

73773. .81217 

/ note="assembly_f ragment" 

81318. .97238 

/ note="assembly_f ragment" 

97339. .113368 

/ not e= "as sembly_f ragment" 

113469. .166137 

/ note="assembly_f ragment" 



misc_feature 166238. .184306 

/note= " as sembly_f ragmen t 

clone_end: T7 

vector^side : right" 
BASE COUNT 58167 a 34380 c 35771 g 55088 t 900 others 
ORIGIN 



Query Match 96.9%; Score 1494.8; DB 2; Length 184306; 

Best Local Similarity 99.9%; Pred. No. 0; 

Matches 14 96; Conservative 0; Mismatches 2; Indels 0; Gaps 0; 



Qy 4 6 GGC ACAGAAT T TAT C T T GT GAGAAT T GGT T GGCAACAGAGGCT AT CT T GAATAAGT AC T A 105 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 145736 GGC AC AGAAT T TAT C T T GT GAGAAT T GGT T GG CAACAGAGGCT AT CT T GAATAAGT AC T A 

145677 



Qy 106 CCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGT 165 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 14567 6 CCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGT 

145617 



Qy 166 GTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCT 225 

III I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 

Db 145616 GTTTGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCT 

145557 



Qy 226 TTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAA 285 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 14 5556 TTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAA 

145497 



Qy 286 T GATAAG G GGAC CT AT GGAGAT GT T CT CT GTATAAGCAAC C GAT AT GT G CT T CACAC CAA 34 5 

I I I I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I II I I II I I II I I I 
Db 14 5496 T GAT AAGGGGAC CT AT GGAGAT GT T CT C T GTATAAGCAAC C GAT AT GT GCT T CACAC CAA 

145437 



Qy 34 6 CCT CT ACAC CAGC AT CCTCTTCCT C AC T T T CAT T AGCAT GGACC GAT AT CT GCT CAT GAA 4 05 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 145436 CCT CT ACAC CAGCAT CCTCTTCCT CACT TT CAT T AGCAT GGACC GAT AT CT GCT CAT GAA 

145377 



Qy 4 06 GTACCCTTTCCGAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGC 4 65 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 145376 GTACCCTTTCCGAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGC 

145317 



Qy 4 66 TGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCC 525 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II II I I I I I I I I I II I I I I II 

Db 14 5316 TGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCC 

145257 



Qy 526 AAAAGAAGAGGGC AGTAACT GC AT CGACT AT GCAAGT T CT G GAAAC CCT GAAC ACAAT CT 585 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 145256 AAAAGAAGAGGGCAGTAACTGC AT CGACT AT GCAAGT TCTGGAAAC CCT GAAC ACAAT CT 

145197 



Qy 



586 



CATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTT 645 



Db 

145137 



145196 



Qy 646 CTT CT ACT ACAAGAT GGTAGT CTT CTTAAAGAGGAGGAGCCAGCAGCAAGCAACT GCCCT 705 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 145136 CTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCT 

145077 

Qy 706 GCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTT 765 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 145076 GCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCAGTTGTGATCTTCTCTATACTCTT 

145017 

Qy 766 C AC AC C CT AT CAT AT CAT GCGCAATTT GAGGAT C GC CT CAC GCCT GGAT AGT T GGC C AC A 825 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I M 
Db 145016 CACAC CCTAT CAT AT CAT GCGCAATTT GAGGAT CGC CT CACGCCT GGAT AGTT GGCCACA 

144957 

Qy 826 AGGAT GT AC ACAGAAG GC C AT CAAAT CT AT AT ACAC ACT GACAC GG CCTCTGGCCTTTCT 885 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 144 956 AG GAT GT AC ACAGAAG GC CAT CAAAT C T AT AT AC ACACT GACAC GGC CTCTGGCCTTTCT 

144897 

Qy 886 GAACAGT GC C AT CAAT C C CAT CTT C T ACTT C CT CAT GGGAGAC CATT ACAGAGAGAT GCT 945 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 144 8 96 GAACAGT GC CAT CAAT CC CAT CTTCTACTTCCT CAT GGGAGAC CATT ACAGAGAGAT GCT 

144837 

Qy 94 6 GAT TAGT AAGT T C AGACAAT ACT T CAAGT C C CT T AC AT C CT T C AGGAC AT GAGCT GCT GG 1005 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I! I I II I I I I I I I I I I I I I I I 

Db 144 836 GAT T AGT AAGT T CAGACAAT ACTT CAAGT CC CTT AC AT CCTTC AGGAC AT GAGCT GCT GG 

144777 

Qy 1006 AT G CAGGT CT T CACT C AG C CAAAAT GAGAC ACT T GAT AAACAGT GCT GT G C AGT T GAGT T 1065 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I M I I I 
Db 14477 6 AT GCAGGT CTT CACT CAGC CAAAAT GAGACACTT GAT AAACAGT GCT GTGCAGTT GAGT T 

144717 

Qy 1066 TTAACTAAGTAAACCACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGG 1125 

I I I I I I I I I I I I I kl I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 144 716 TTAACTAAGTAAACCACCATTTCTAGGCTTTAGCTTTCCACCATCCTCC7\ACCCCCAGGG 

144657 

Qy 1126 CTGGAGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAGGTT 1185 

I I II I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 144 656 CTGGAGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAGGTT 

144597 

Qy 1186 AT AC C CAGAGT AT GGAAAAAAT AAGGCAT GAGAAAGC AT T GAC AT CT T CACT T AAGAACT 1245 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 144 596 AT AC C CAGAGT AT GGAAAAAAT AAGGCAT GAGAAAGC AT T GACAT CT T CACT T AAGAACT 

144537 

Qy 1246 GAACAAAAGAGAACAAATATT GT CAAT GTTTGGACACTTAGGAT CT GAAATCTT GGAAAT 1305 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 



Db 144536 GAACAAAAGAGAACAAAT AT T GT CAAT GT T T G GAC ACT T AGGAT CT GAAAT CT T GGAAAT 

144477 



Qy 1306 TTTAAGACCT CTTTTT CTAT CAGT GTAAAAGGAATACAAGATAGCTAGTT GCAAATGCT G 13 65 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
Db 1444 7 6 TTTAAGACCT CTTTTT CTAT CAGT GTAAAAGGAATACAAGATAGCTAGTT GCAAAT GCTG 

144417 

Qy 1366 AAT GCAT TT C AT CAT T GGT CAG GT C GATAAGC GT GT T T CT GAAAT AGT CT T AT TT T TAT T 1425 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 144416 AAT G CATTT C AT CAT T GGT CAGGT C GATAAGC GT GT T T CT GAAAT AGT CT TAT T TT T ATT 

144357 

Qy 1426 CT T GT AAT AT T AAAAT T TAT GT GAAAAAT GAAT ATAAT T CAAT GTACAACATT AGATTT T 14 85 

I I I I I I I I I I I I I I I I I I I I I M I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 144356 CT T GTAAT AT T AAAATT T AT GT GAAAAAT GAAT AT AAT T CAAT GTACAACATT AGAT T TT 

144297 

Qy 14 8 6 C TAT T T GAAAATT AT AT T T CT T GAAAAAAT AAC T G C T GT GC CT AAAT AAAT CAAT ATA 1543 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 1442 96 CTAT T T GAAAAT TAT AT T T CT T GAAAAAAT AACT GCT GT GC CTAAATAAAT CAAT AT A 144239 



RESULT 4 

AC111231/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC111231 239576 bp DNA linear HTG 13-MAY-2003 

Rattus norvegicus clone CH230-96O13 , *** SEQUENCING IN PROGRESS 

2 unordered pieces. 
AC111231 

AC1112 31.7 GI: 30578486 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS__ENRICHED . 
Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 239576) 

Muzny, D.Marie., Metzker , M. Lee . , Abramzon, S . , Adams, C, Alder, J., 
Allen, C, Allen, H., Alsbrooks , S . , Amin,A. , Anguiano,D., 
Anyalebechi, V. , Aoyagi,A., Ayodeji,M., Baca,E., Baden, H., 
Baldwin, D., Bandaranaike, D . , Barber, M. , Barnstead, M. , Benahmed, F . , 
Biswalo,K., Blair, J., Blankenburg, K . , Blyth,P., Brown, M. , 
Bryant, N., Buhay, C, Burch,P., Burrell,K., Calderon,E., 
Cardenas, V., Carter, K., Cavazos,I., Ceasar,H., Center, A. , 
Chacko,J., Chavez , D . , Chen,G., Chen,R., Chen,Y., Chen, Z . , Chu,J., 
Cleveland, C. , Cockrell,R., Cox,C, Coyle,M., Cree,A., D l Souza,L., 
Davila,M.L., Davis, C, Davy-Carroll, L . , De Anda,C, Dederich,D., 
Delgado,0., Denson,S., Deramo,C, Ding,Y., Dinh,H., Divya,K., 
Draper, H., Dugan-Rocha, S . , Dunn, A. , Durbin,K., Duval, B., Eaves, K., 
Egan,A., Escotto,M., Eugene, C, Evans, C. A., Falls, T., Fan,G., 
Fernandez, S . , Finley,M. , Flagg,N., Forbes, L., Foster, M. , Foster, P., 
Fraser,C.M., Gabisi,A., Ganta,R., Garcia,A* , Garner, T. , Garza, M. , 
Gebregeorgis, E. , Geer,K., Gill,R., Grady, M. , Guerra,W. , Guevara, W., 
Gunaratne, P . , Haaland,W. , Hamil,C, Hamilton, C, Hamilton, K. , 
Harvey, Y. , Havlak,P., Hawes,A., Henderson, N . , Hernandez, J. , 
Hernandez, R. , Hines,S., Hladun,S.L., Hodgson, A., Hogues,M., 
Hollins,B., Howells,S., Hulyk, S . , Hume, J., Idlebird,D., Jackson, A. , 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Jackson, L., Jacob, L., Jiang, H., Johnson, B., Johnson,R. / Jolivet,A. , 
Karpathy,S., Kelly, S., Kelly, S., Khan,Z., King,L., Kovar,C, 
Kowis,C, Kraft, C.L., Lebow,H. f Levan,J., Lewis,L., Li,Z., Liu, J., 
Liu, J., Liu,W., Liu,Y., London, P., Longacre,S., Lopez, J., 
Lorensuhewa, L. , Loulseged, H . , Lozado,R.J., Lu,X., Ma, J., 
Maheshwari,M. , Mahindartne, M. , Mahmoud, M. , Malloy,K., Mangum,A., 
Mangum,B., Mapua,P., Martin, K. , Martin, R. , Martinez, E. , 
Mawhiney,S., McLeod,M.P., McNeill , T . Z . , Meenen,E., 
Milosavl jevic, A. , Miner, G. , Minja,E., Montemayor, J. , Moore, S., 
Morgan, M. , Morris, K. , Morris, S., Munidasa,M., Murphy,M., Nair,L. , 
Nankervis, C. , Neal,D., Newton, N. , Nguyen, N. , Norris,S., 
Nwaokelemeh, O. , Okwuonu,G., Olarnpunsagoon, A. , Pal,S., Parks, K. , 
Pasternak, S. , Paul,H., Perez, A. , Perez, L . , Pf annkoch, C . , 
Plopper,F., Poindexter, A. , Popovic,D., Primus, E. , Pu,L.-L., 
Puazo,M., Quiroz,J., Rachlin,E., Reeves, K. , Regier,M.A., Reigh,R., 
Reilly,B., Reilly,M., Ren,Y., Reuter,M., Richards, S., Riggs,F., 
Rives, C, Rodkey,T., Rojas,A., Rose,M., Rose,R., Ruiz, S. J., 
Sanders, W., Savery,G., Scherer,S., Scott, G. , Shatsman, S . , Shen,H., 
Shetty,J., Shvartsbeyn, A. , Sisson,I., Sitter, CD., Smajs,D., 
Sneed,A., Sodergren, E . , Song,X.-Z., Sorelle,R., Sosa,J., 
Steimle,M. , Strong, R., Sutton, A. , Svatek,A. , Tabor, P., Taylor, C, 
Taylor, T., Thomas, N., Thomas, S., Tingey,A. , Trejos,Z., Usmani,K., 
Valas,R., Vera,V. , Villasana, D . , Waldron,L., Walker, B., Wang, J., 
Wang,Q., Wang,S., Warren, J., Warren, R. , Wei,X., White, F. , 
Williams, G., Willson,R., Wleczyk,R., Wooden, H., Worley,K., 
Wright, D., Wright, R. , Wu,J., Yakub,S., Yen, J. , Yoon,L., Yoon,V., 
Yu,F., Zhang, J., Zhou, J., Zhou,X., Zhao,S., Dunn,D., von 
Niederhausern, A. , Weiss, R. , Smith, D . R. , Holt, R. A., Smith, H.O., 
Weinstock,G. and Gibbs,R.A. 
Direct Submission 
Unpublished 

2 (bases 1 to 239576) 
Worley,K.C. 

Direct Submission 

Submitted ( 19-FEB-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

3 (bases 1 to 239576) 

Rat Genome Sequencing Consortium. 
Direct Submission 

Submitted ( 13-MAY-2003 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On May 13, 2003 this sequence version replaced gi: 24819079. 
The sequence in this assembly is a combination of BAC based reads 
and whole genome shotgun sequencing reads assembled using Atlas 
(http://www.hgsc.bcm.tmc.edu/projects/rat/). Each contig described 
in the feature table below represents a scaffold in the Atlas 
assembly (a ' contig-scaf f old 1 ) . Within each contig-scaf f old, 
individual sequence contigs are ordered and oriented, and separated 
by sized gaps filled with Ns to the estimated size. The sequence 
may extend beyond the ends of the clone and there may be sequence 
contigs within a contig-scaf fold that consist entirely of whole 
genome shotgun sequence reads . Both end sequences and whole genome 
shotgun sequence only contigs will be indicated in the feature 
table . 

Genome Center 



FEATURES 

source 



Center: Baylor College of Medicine 
Center code: BCM 

Web site: http://www.hgsc.bcm.tmc.edu/ 

Contact: hgsc-help@bcm.tmc.edu 
Project Information 

Center project name: GLVO 

Center clone name: CH230-96O13 
Summary Statistics 

Assembly program: Atlas 3.0; 

Consensus quality: 213738 bases at least Q40 

Consensus quality: 217471 bases at least Q30 

Consensus quality: 220066 bases at least Q20 

Estimated insert size: 227472; sum-of-contigs estimation 

Quality coverage: 6x in Q20 bases; sum-of-contigs estimation 



NOTE: Estimated insert size may differ from sequence length 

(see http: //www. hgsc.bcm. tmc . edu/docs/Genbank_draf t_data . html ) 
NOTE: This is a 'working draft 1 sequence. It currently 
consists of 2 contigs . The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

1 236521: contig of 236521 bp in length 
236522 236621: gap of unknown length 
236622 239576: contig of 2955 bp in length. 
Location/Qualifiers 
1. .239576 

/organism="Rattus norvegicus" 
/mo l_type=" genomic DNA" 
/db_xref="taxon: 10116" 
/clone="CH2 30-96013" 



misc feature 



misc feature 



BASE COUNT 
ORIGIN 



157219. .158900 
/ note="wgs_contig" 
206334. .207349 
/ note="wgs__contig" 
67727 a 42048 c 43435 g 



68312 t 18054 others 



Query Match 67.5%; 
Best Local Similarity 85.2%; 
Matches 1287; Conservative 



Score 1041.6; DB 2; 
Pred. No. 1.3e-221; 
0; Mismatches 194; 



Length 239576; 
Indels 29; Gaps 



10; 



Qy 4 6 GGCACAGAATTTAT CTT GT GAGAATTGGTTGGCAACAGAGGCTAT CTT GAATAAGTACTA 105 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I 
Db 92574 GGCAC AGAAT T TAT CTT GT GAAAATT GGCT GGC AT T AGAGAAT AT T T T GAAAAAGT AC T A 92515 



Qy 106 CCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGT 165 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I III I I I I I I I I I 
Db 92514 CCTCTCTGCATTTTATGGGATCGAGTTCATTGTTGGAATGCTTGGCAATTTCACCGTGGT 92455 



Qy 

Db 



166 GTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCT 225 
I I I I I I i I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
92454 GTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGTAGCTUVCGTCTATCTCTTCAACCT 92395 



Qy 22 6 TTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAA 28 5 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | I I I I I I I I I 

Db 92394 TTCCATCTCTGACCTTGCTTTCCTGTGCACGCTTCCCATGCTGATAAGGAGTTACGCCAC 92335 

Qy 286 T GATAAGGGGAC CT AT GGAGAT GT T CT CT GT ATAAGCAAC C GAT AT GT GCT T CACAC CAA 345 

II II I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I 
Db 92334 T G GGAACT GGAC CT AT GGAGAT GT T CT CT GCATAAGCAAC C GTT AT GT GCT T CAT G C CAA 92275 

Qy 346 C CT CT AC AC C AGCAT CCTCTTCCT C ACTT T CAT T AGCAT GGACC GAT AT C T GC TCAT GAA 4 05 

I I I I I I I M I M I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | | | | | | | | | | 

Db 92274 C CT CT AC AC C AG CAT CCTTTTCCT CAC T T T CAT T AGCAT AGACC GAT AT CT GCT CAT GAA 92215 

Qy 4 06 GT ACC C T T T C C GAGAACACT T T CT ACAAAAGAAGGAAT T T GC C AT TT T AAT CTCGCTGGC 465 

II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | Mill 

Db 92214 GT TCCCTTTCC GAGAACAC ATT CT ACAAAAGAAGGAATT T GC CAT T T T AAT CTCCCTGGC 92155 

Qy 4 66 TGTCTGGGCCTTAGTGACCTTAG7VAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCC 525 

I I I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I M I I I I II I I I I Ml IN 
Db 92154 TGTCTGGGTCTTAGTGACCTTAGAAGTTCTACCTATGCTCACGTTTATCACTTCCACCCC 92 095 

Qy 526 AAAAGAAGAGGG C AGTAACT GC AT C GACT AT G CAAGTT CT GGAAAC C CT GAAC ACAAT CT 585 

M I I I I I I M I I III. I I M I I I I I I I I I I I I | | | | | | | | | | || || I III 
Db 92 094 AAT AGAAAAGGGC GAC AGCTGT GT C GAC T AT GCAAGT T C T GGAAAC C CT AAATACAGT CT 92 035 

Qy 586 CATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTT 64 5 

I I I M I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I II I I I I I 
Db 92034 CATTTACAGCCTGTGCCTGACTTTGCTGGGCTTCCTCATTCCTCTGTCTGTAATGTGCTT 91975 

Qy 646 CTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCT 7 05 

I I I I I I 1 M I II I II I I II I I I II I I II I I I I I I I I II I I I I I I I I I I I I I I II 
Db 91974 CTTCTACTACAAAATGGTAGTCTTCCTAAAGAAGAGGAGCCAGCAGCAGGCAACTGTGCT 91915 

Qy 7 06 GCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTT 765 

I Ml I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 91914 ATCGCTGAACAAACCTCTGCGCCTGGTGGTCCTGGCAGTGGTGATCTTCTCTGTACTCTT 91855 

Qy 766 CACAC C CT AT CAT AT CAT GC GCAATT T GAGGAT C GCCT CAC GC CT GGAT AGT T GGC C AC A 825 

Mill II I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 91854 T AC AC CT T AC CAT AT CAT GC GCAAT GT GAGGAT T GC CT C AC GCT T G GAT AGCT GGC CACA 91795 

Qy 82 6 AGGAT GT ACACAGAAGGCCATCAAAT CTATATACACACT GACAC GGCCT CTGGCCTTTCT 885 

I I I I I I I MINIUM I I I I I | | | | | | | | || | | | | | | | | | | 

Db 91794 GGGAT GTT CCCAGAAGGCCATCAAATGCTTATACATCCTGACCAGACCT CTGGCCTTTCT 91735 

Qy 88 6 GAAC AGT GC CAT CAAT C C CAT CT T CT ACT T C CT CAT GGGAGAC CATT ACAGAGAGAT GCT 945 

I I I I I I I I I I I I I II I I I I I I I I I I I II I | I I I I I I I I I I I I I I | I | | | | | | | 
Db 91734 GAAC AGT GCT GT CAAC C C CAT CT TC T AC TTC CT T GT GGGAGAC CAT T T C AGAGAC AT GCT 91675 

Qy 94 6 GAT T AGT AAGT T CAGACAATACT T CAAGT CC CT T ACAT C CTT C AGGAC AT GAGCT GCT GG 1005 

I M M I I I I I I I I I I I I I I I I I I I I I I I I I II I | I I I I I I I I I Ml | | 
Db 91674 GTTTAGTAAGTTGAGACAATACTTCAAGTCCCTTACGTCCTTCAGGCTCTGACCT A 91619 

Qy 1006 AT G C AG GT C T T C ACT C AGC CAAAAT GAGACACT T GATAAAC AGT GCT GT GC AGT T GAGTT 1065 

Ml I M I I I I II I I I I I I I Ml | | | | | || | M | | | | | | | | | 

Db 91618 AT GTAGGT CTT CACT GAGC CAGAATAAGACT C AACTCTGCAGTT GAGTT 91570 

Qy 1066 TTAACTAAGTAAACCACCATTTCTAGGCTTTAGC-TTTCCACCATCCTCCAACCCCCAGG 1124 



II II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

Db 91569 T T GAC CAAGT AGAC C ACC AC CT C T AGGCT T TAGC GTT C C C AC CAT C CT C CAAC C C T GAGT 91510 

Qy 1125 GCTGGAGTACAAGCTGGGTCCACATGAATCAGAAG-GCAGCTCTCTGTTCTGATTTTAGG 1183 

Ml I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I II I I I 

Db 91509 G CT AGAGCACAAAC T GGGCACAC AT GAAT CAGAAGAGCAAC CAT CT GT C C C GAT T TTAGG 91450 

Qy 1184 TTATACCCAGAGTATGGAAAAAATAAGGCAT GAGAAAGCATT GACATCTT CACTTAAGAA 1243 

I II I I I I I I I II I I I I I I I II MM I I I I I I I I I I II II I I I I I I I I I I I I I 
Db 91449 CTGTACCCAGAGTAT GG- AAAAAT GAGGCCC CAGAAAGCATT GACATCTTCACATAAGAA 91391 

Qy 1244 CT GAACAAAAGAGAACAAATATT GT CAATGTTT GGACACTTAGGATCTGAAATCTTGGAA 1303 

I I I I I I I I I I I I III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 91390 CT GAACAAAAGAAAACT GAT GT T GT CAAT AT T T GGAC ACTTAAGATC CAAGGC GTT GGAG 91331 

Qy 1304 AT T T TAAGAC CT CT T~ T TT CT AT C AGT GTAAAAGGAAT ACAAGATAGCT AGT T G CAAAT G 1362 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I III I I I I II I I I I 
Db 91330 ATT TT AAGAC AT CT T CT T T C TAT C AGT GTAAAAGGAAT AC GAGACAGCTAGT T - CT GAC A 91272 

Qy 1363 CT GAAT GCATT T CAT CAT T GGT C AG GT C GAT AAG C GT GT T T CT GAAAT AGT C TTAT 1418 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | M | | I I I I 

Db 91271 CT GAAT GCAT T T T GT CAT T GGT C AGCTT GATAAGAAT GT T T CT GAAAT AGT C T CT AT TAT 91212 

Qy 1419 TTTT ATT CTT GT AATAT TAA-AATTTAT GT GAAAAAT GAAT ATAATT CAATGTACAACAT 1477 

I I I I I I I I II I I I I I I I I I Mill | | | | Ml M II I I I I I I I I I I I I I I 
Db 91211 T T T T ATT CT T G CAAT AT TAAC CT TT T AT AT GAAT GGT GAGT AGAACT CAAT GT ACAAC AT 91152 

Qy 14 7 8 T AGAT TTT CT ATT T GAAAATT AT ATT T CTT GAAAA AATAACTGCTGTGCCTAAATA 1533 

I M I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I Ml I I I I I 

Db 91151 TAGCAATTATATTCAGAAAGTACATTT CTT GAAAAAAT GAATAACT GCAAT GCCTAAATA 91092 

Qy 1534 AAT CAAT AT A 1543 

I I I I I I I I 
Db 91091 AAT CAAC AC A 91082 
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AC116149 60298 bp DNA linear HTG 25-MAR-2002 

Mus musculus clone RP24-540E9, LOW-PASS SEQUENCE SAMPLING. 
AC116149 

AC116149. 1 GI : 197 032 73 
HTG; HTGS_PHASE0. 
Mus musculus (house mouse) 
Mus musculus 
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TITLE Direct Submission 
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Web site: http : //www-seq. wi . mit - edu 
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Project Information 

Center project name: L24912 
Center clone name: 540 E 9 



* NOTE: This record contains 77 individual 

* sequencing reads that have not been assembled into ^ 

* contigs. Runs of N are used to separate the reads ^ 

* and the order in which they appear is completely 

* arbitrary. Low-pass sequence sampling is useful for 

* identifying clones that may be gene-rich and allows 

* overlap relationships among clones to be deduced. 

* However, it should not be assumed that this clone 

* will be sequenced to completion. In the event that 

* the record is updated, the accession number will 

* be preserved. 
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Query Match 41.9%; Score 645.8; DB 2; Length 60298; 

Best Local Similarity 84.0%; Pred. No. 2.4e-133; 

Matches 673; Conservative 0; Mismatches 127; Indels 1; Gaps 1; 

Qy 51 AGAAT T TAT CT T GT GAGAAT T GGT T GGCAACAGAG GCT AT CT T GAAT AAGT ACT ACCT CT 110 

I I I I I I I I MMI I I I I I I I I I I I I I I I I I I I I I I I I II I I 

Db 38 90 AGATCTGATATCTCGCCCTGTGGTGGAATTCTCAGGCTATCTTGAATAAGTACTACCTCT 394 9 

Qy 111 CTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCG 170 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | I I I I I 1 I I I I 
Db 3950 CTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTTG 4 009 

Qy 171 GCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTT7VACCTTTCCA 230 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 010 GCTACCTCTTCTGCATGTVAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCA 4 069 

Qy 231 TCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATA 290 

I I I I I M I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
Db 4070 TCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATA 4129 

Qy 2 91 AGGGGAC CT AT G GAGAT GT T CT CT GTAT AAGCAACC GAT AT GT GCT T CACAC CAAC CT CT 350 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4130 AGGG GAC CT AT GGAGAT GT T CT CT GT ATAAGCAAC C GAT AT GT GCTT CACAC CAAC CT C T 418 9 

Qy 351 AC AC CAG CAT CCTCTTCCT CACT TTC AT T AGC AT GGAC C GAT AT CT GCT CAT GAAGT AC C 410 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4190 ACAC CAGC ATC CT CT T C CT C ACTT T CAT T AGCAT GGAC C GAT AT CT GCT CAT GAAGT AC C 424 9 

Qy 411 CT T T CC GAGAACAC T T T CT ACAAAAGAAGGAAT T TGC C ATT T TAAT CT CGCT G G CT GT C T 470 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 4250 CTTTCCGAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCT 4 309 

Qy 471 G GGC CTTAGT GAC CTTAGAAGTTCTACC CAT GCT CACT TT CAT CAATTCTGTCCCAAAAG 530 

I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4310 GGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCCAAAAG 4369 

Qy 531 AAGAGGG CAGTAACT GCAT C GAC TAT G CAAGT T CT GGAAAC C CT GAACACAAT CT CATTT 590 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 4 37 0 AAGAGG GC AGTAACT GCAT C GAC TAT GCAAGT T C T GGAAAC C CT GAACACAAT CT C AT T T 442 9 

Qy 591 ACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCT 650 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4430 ACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCT 4 4 89' 



Qy 



651 ACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCAC 710 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I 



Db 



44 90 ACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCAC 454 9 



Qy 711 TGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACAC 77 0 

I I I I I I II I I I I I I I I I I I II I I I II 
Db 4 550 TGGAC-7\ACCCAAACGCCTGGGGGTCCTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 4608 

Qy 771 C CT AT CAT AT CAT GC GCAAT T T GAGGAT C GC CT CAC GCCT GGAT AGT TG G C CACAAGGAT 83 0 

Db 4 609 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 4 668 

Qy 831 GTACACAGAAGGCCATCAAAT 851 

II II I II 
Db 4 669 NNNNNNNNNCGGAGATCTGAT 4 68 9 
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AC116149/c 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



AC116149 60298 bp DNA linear HTG 25-MAR-2002 

Mus musculus clone RP24-540E9, LOW-PASS SEQUENCE SAMPLING. 
AC116149 

AC11614 9. 1 GI : 19703273 
HTG; HTGS_PHASE0 . 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chorciata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 60298) 

Birren,B., Linton, L., Nusbaum, C. and Lander, E. 

Mus musculus, clone RP24-540E9 

Unpublished 

2 (bases 1 to 60298) 

Birren,B., Linton, L., Nusbaum, C. , Lander, E. , Ali, A. , Allen, N., 
Anderson, S., Barna,N., Bastien,V. , Bloom, T. , Boguslavkiy, L . , 
Boukhgalter, B. , Brown, A. , Camarata,J., Campopiano, A. , Chang, J., 
Chazaro,B., Choepel,Y., Colangelo, M. , Collins, S., Collymore , A. , 
Cook, A., Cooke, P., DeArellano, K . , Dewar,K., Diaz, J. S . , Dodge, S., 
Faro,S., Ferreira,P., FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., 
Ginde,S., Gord,S., Goyette,M. , Graham, L. , Grand-Pierre, N . , 
Hagos,B., Horton,L., Hulme,W., Iliev, I., Johnson, R. , Jones, C. , 
Kamat,A., Karatas,A., Kells,C, LaRocque,K., Lamazares , R. , 
Landers, T., Lehoczky,J., Levine,R., Lindblad-Toh, K . , Liu, G. , 
MacLean,C, Macdonald, P . , Major, J., Marquis, N., Matthews, C, 
McCarthy, M. , McEwan,P., McKernan,K., Meldrim, J., Meneus,L., 
Mihova,T., Mlenga,V., Murphy, T., Naylor,J., Nguyen, C, Nicol,R., 
Norbu,C, Norman, C.H., 0'Connor,T., 0 1 Donnell , P . , 0 ! Neil,D., 
Oliver, J., Peterson, K., Phunkhang, P . , Pierre, N., Pollara,V., 
Raymond, C, Retta,R., Rieback,M. , Riley, R., Rise,C, Rogov, P . , 
Roman, J., Rosetti,M., Roy, A., Santos, R., Schauer,S., Schupback, R. , 
Seaman, S., Severy,P., Spencer, B., Stange-Thomann, N . , Stojanovic, N. , 
Strauss , N . , Subramanian, A. , Talamas, J., Tesfaye,S., Theodore, J., 
Topham,K., Travers,M. , Travis, N., Trigilio,J., Vassiliev, H . , 
Viel,R., Vo,A., Wilson, B., Wu,X., Wyman,D., Ye,W.J., Young, G., 
Zainoun,J., Zembek,L., Zimmer,A. and Zody,M. 
Direct Submission 

Submitted (25-MAR-2002 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
All repeats were identified using RepeatMasker : 



Smit, A.F.A. & Green, P. (1996-1997) 

http : / / ftp . genome . Washington . edu/RM/RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submis s ions ggenome . wi .mit . edu 

Project Information 

Center project name: L24912 
Center clone name: 540 E 9 



* NOTE: This record contains 77 individual 

* sequencing reads that have not been assembled into 

* contigs . Runs of N are used to separate the reads 

* and the order in which they appear is completely 

* arbitrary. Low-pass sequence sampling is useful for 

* identifying clones that may be gene-rich and allows 

* overlap relationships among clones to be deduced. 

* However, it should not be assumed that this clone 

* will be sequenced to completion. In the event that 

* the record is updated, the accession number will 

* be preserved. 





1 


656 


: contig 


of 656 


bp 


in 


length 




657 


756 


: gap of 


100 bp 








* 


757 


1426 


: contig 


of 670 


bp 


in 


length 


-k 


1427 


1526 


: gap of 


100 bp 








■k 


1527 


2210 


: contig 


of 684 


bp 


in 


length 


* 


2211 


2310 


: gap of 


100 bp 










2311 


2997 


: contig 


of 687 


bp 


in 


length 




2998 


3097 


: gap of 


100 bp 








* 


3098 


3786 


: contig 


of 689 


bp 


in 


length 




3787 


3886 


: gap of 


100 bp 










3887 


4577 


. contig 


of 691 


bp 


in 


length 


* 


4578 


4677 


gap of 


100 bp 










4678 


5357 


contig 


of 680 


bp 


in 


length 




5358 


5457 


gap of 


100 bp 










5458 


6150 


contig 


of 693 


bp 


in 


length 




6151 


6250 


gap of 


100 bp 










6251 


6817 


contig 


of 567 


bp 


in 


length 


* 


6818 


6917 


gap of 


100 bp 










6918 


7615 


contig 


of 698 


bp 


in 


length 


* 


7616 


7715 


gap of 


100 bp 








* 


7716 


8412 


contig 


of 697 


bp 


in 


length 




8413 


8512 


gap of 


100 bp 










8513 


9198 


contig 


of 686 


bp 


in 


length 


* 


9199 


9298 


gap of 


100 bp 










9299 


9988 


contig 


of 690 


bp 


in 


length 




9989 


10088 


gap of 


100 bp 








* 


10089 


10768 


contig 


of 680 


bp 


in 


length 




10769 


10868 


gap of 


100 bp 










10869 


11524 


contig 


of 656 


bp 


in 


length 




11525 


11624: 


gap of 


100 bp 










11625 


12242: 


contig 


of 618 


bp 


in 


length 




12243 


12342: 


gap of 


100 bp 








* 


12343 


13040: 


contig 


of 698 


bp 


in 


length 




13041 


13140: 


gap of 


100 bp 










13141 


13829: 


contig 


of 689 


bp 


in 


length 





13830 


13929 


: gap of 


100 bp 










13930 


14647 


: contig 


of 718 


bp 


in 


length 




14648 


14747 


: gap of 


100 bp 










14748 


15451 


: contig 


of 704 


bp 


in 


length 




15452 


15551 


: gap of 


100 bp 










15552 


16247 


: contig 


of 696 


bp 


in 


length 




16248 


16347 


: gap of 


100 bp 










16348 


17028 


: contig 


of 681 


bp 


in 


length 


* 


17029 


17128 


: gap of 


100 bp 










17129 


17802 


: contig 


of 674 


bp 


in 


length 




17803 


17902 


: gap of 


100 bp 








* 


17903 


18593 


: contig 


of 691 


bp 


in 


length 




18594 


18693 


: gap of 


100 bp 










18694 


19375 


: contig 


of 682 


bp 


in 


length 




19376 


19475 


: gap of 


100 bp 








* 


19476 


20082 


: contig 


of 607 


bp 


in 


length 




20083 


20182 


: gap of 


100 bp 










20183 


20875 


: contig 


of 693 


bp 


in 


length 




20876 


20975 


: gap of 


100 bp 










20976 


21650 


: contig 


of 675 


bp 


in 


length 




21651 


21750 


: gap of 


100 bp 








* 


21751 


22427 


: contig 


of 677 


bp 


in 


length 




22428 


22527 


: gap of 


100 bp 










22528 


23238 


contig 


of 711 


bp 


in 


length 




23239 


23338 


gap of 


100 bp 










23339 


24028 


contig 


of 690 


bp 


in 


length 




24029 


24128 


gap of 


100 bp 










24129 


24803 


contig 


of 675 


bp 


in 


length 




24804 


24903: 


gap of 


100 bp 










24904 


25603: 


contig 


of 700 


bp 


in 


length 


* 


25604 


25703: 


gap of 


100 bp 










25704 


26357: 


contig 


of 654 


bp 


in 


length 




26358 


26457: 


gap of 


100 bp 








* 


26458 


27140: 


contig 


of 683 


bp 


in 


length 




27141 


27240: 


gap of 


100 bp 








* 


27241 


27946: 


contig 


of 706 


bp 


in 


length 


* 


27947 


28046: 


gap of 


100 bp 










28047 


28734: 


contig 


of 688 


bp 


in 


length 




28735 


28834: 


gap of 


100 bp 








* 


28835 


29536: 


contig 


of 702 


bp 


in 


length 




29537 


29636: 


gap of 


100 bp 










29637 


30324: 


contig 


of 688 


bp 


in 


length 




30325 


30424: 


gap of 


100 bp 










30425 


31130: 


contig 


of 706 


bp 


in 


length 




31131 


31230: 


gap of 


100 bp 








* 


31231 


31910: 


contig 


of 680 


bp 


in 


length 




31911 


32010: 


gap of 


100 bp 










32011 


32691: 


contig 


of 681 


bp 


in 


length 


■A- 


32692 


32791: 


gap of 


100 bp 










32792 


33482: 


contig 


of 691 


bp 


in 


length 




33483 


33582: 


gap of 


100 bp 










33583 


34274: 


contig 


of 692 


bp 


in 


length 




34275 


34374: 


gap of 


100 bp 










34375 


35081: 


contig 


of 707 


bp 


in 


length 




35082 


35181: 


gap of 


100 bp 










35182 


35861: 


contig 


of 680 


bp 


in 


length 




35862 


35961: 


gap of 


100 bp 











35962 


36660 


: contig 


of 699 


bp 


in 


length 




36661 


36760 


: gap of 


100 bp 










36761 


37447 


: contig 


of 687 


bp 


in 


length 


* 


37448 


37547 


: gap of 


100 bp 








* 


37548 


38243 


: contig 


of 696 


bp 


in 


length 




38244 


38343 


: gap of 


100 bp 










38344 


39034 


: contig 


of 691 


bp 


in 


length 




39035 


39134 


: gap of 


100 bp 








* 


39135 


39813 


: contig 


of 679 


bp 


in 


length 


* 


39814 


39913 


: gap of 


100 bp 










39914 


40597 


: contig 


of 684 


bp 


in 


length 




40598 


40697 


: gap of 


100 bp 








* 


40698 


41392 


: contig 


of 695 


bp 


in 


length 




41393 


41492 


: gap of 


100 bp 










41493 


42190 


: contig 


of 698 


bp 


in 


length 




42191 


42290 


: gap of 


100 bp 








* 


42291 


42967 


: contig 


of 677 


bp 


in 


length 




42968 


43067 


* gap of 


100 bp 










43068 


43736 


. contig 


of 669 


bp 


in 


length 




43737 


43836 


gap of 


100 bp 










43837 


44525 


contig 


of 689 


bp 


in 


length 


•A- 


44526 


44625 


gap of 


100 bp 










44626 


45306 


contig 


of 681 


bp 


in 


length 


* 


45307 


45406 


gap of 


100 bp 










45407 


46111 


contig 


of 705 


bp 


in 


length 




46112 


46211: 


gap of 


100 bp 










46212 


46848: 


contig 


of 637 


bp 


in 


length 


* 


46849 


46948: 


gap of 


100 bp 










46949 


47639: 


contig 


of 691 


bp 


in 


length 




47640 


47739: 


gap of 


100 bp 










47740 


48431: 


contig 


of 692 


bp 


in 


length 




48432 


48531: 


gap of 


100 bp 










48532 


49221: 


contig 


of 690 


bp 


in 


length 




49222 


49321: 


gap of 


100 bp 










49322 


50017: 


contig 


of 696 


bp 


in 


length 




50018 


50117: 


gap of 


100 bp 








* 


50118 


50799: 


contig 


of 682 


bp 


in 


length 


~k 


50800 


50899: 


gap of 


100 bp 










50900 


51583: 


contig 


of 684 


bp 


in 


length 


★ 


51584 


51683: 


gap of 


100 bp 










51684 


52384: 


contig 


of 701 


bp 


in 


length 




52385 


52484: 


gap of 


100 bp 








* 


52485 


53167: 


contig 


of 683 


bp 


in 


length 




53168 


53267: 


gap of 


100 bp 










53268 


53966: 


contig 


of 699 


bp 


in 


length 


■k 


53967 


54066: 


gap of 


100 bp 









Query Match 41-1%; Score 633.6; DB 2; Length 60298; 

Best Local Similarity 97.6%; Pred. No. 1.2e-130; 

Matches 664; Conservative 0; Mismatches 14; Indels 2; Gaps 2; 

Qy 379 TAGCAT GGACCGATAT CTGCT CATGAAGT ACCCTTT C CGAG- AACACTTT CTACAAAA- G 43 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill I I I I I I I I I I I I I I I I I 
Db 36659 TAGCAT GGAC CGAT AT CTGCT CAT GAAGT AC CCTT C C C GAGAAACACT TT CTACAAAAN G 36600 



Qy 



437 AAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTA 4 96 
I I I I I I I I I I I I I I I I I I I I I II I I M I I I I I I M I II I II I I I I I I I I I I I I I I I ■ I II 



Db 



36599 AAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCTTTAGTGACCTTAGAAGTTCTA 36540 



Qy 4 97 C CCAT GCT C ACT T T CAT CAATT C T GT C C CAAAAGAAGAG G GCAGTAACT GCAT C GACT AT 556 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I f I I I I I I I I I I I I I I I I 
Db 36539 C C CAT GCT CACT T T CAT CAAT T CT GT C C CAAAAGAAGAGGGCAGTAACTGC ATC GACT AT 364 80 

Qy 557 GCAAGTTCTGGAAACCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGC 616 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 36479 GCAAGTTCTGGAAACCCTGAACACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGC 36420 

Qy 617 TTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAG 67 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 36419 TTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGATGGTAGTCTTCTT7W\AG 36360 

Qy 677 AGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTC 7 36 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II | | | | | | | | | | 
Db 36359 AGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTC 36300 

Qy 7 37 CTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATATCATGCGCAATTTGAGG 7 96 

I I I I I I I II I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 36299 CTGGCAGTTGTGATCTTCTCTATACTCTTCACACCCTATCATATCATGCGCAATTTGAGG 36240 

Qy 7 97 ATCGCCT CAC GCCT GGATAGTT GGCCACAAGGATGTACACAGAAGGCCAT CAAAT CTATA 856 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I 

Db 3 6239 AT C GC CT CAC GC CT GGATAGTT GG C CACAAGGAT GT AC ACAGAAGGC C AT CAAAT C TATA 36180 

Qy 857 TACACACTGACACGGCCTCTGGCCTTTCTGAACAGTGCCATCAATCCCATCTTCTACTTC 916 

I I II I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 36179 T AC ACACT GAC AC GG C CT CT GGC C TT T CT GAAC AGT GC C AT CAAT C C CAT CT T CT ACT T C 36120 

Qy 917 CT CAT GGGAGAC CATT AC AGAGAGAT GCT GAT T AGT AAGT T CAGACAAT ACT T CAAGT C C 97 6 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | | | | | | | | 
Db 36119 CTCATGGGAGACCATTACAGAGAGATGCTGATTAGTAAGTTCAGACAATACTTCAAGTCC 36060 

Qy 977 CTT ACAT CCTT CAGGACAT GAGCT GCT GGATGCAGGTCTT CACT CAGC CAAAAT GAGACA 1036 

I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I | | | | | 

Db 36059 CTT ACAT CCTT CAGGACAT GAGCT GCT GGAT GCAGGTCTT CACT CAGCCAAAAT GAGACA 36000 

Qy 1037 CTT GAT AAAC AGT GCT GT GC 1056 

II III I I I I 

Db 359 99 CT GAGAAT C CACCACAGG GC 35980 



RESULT 7 

AC110839/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 



AC110839 326606 bp DNA linear HTG ll-OCT-2002 

Rattus norvegicus clone CH230-208A12 , *** SEQUENCING IN PROGRESS 

25 unordered pieces. 
AC110839 

AC110839. 4 GI: 2382 0318 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_ENRICHED . 
Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 326606) 



AUTHORS Muzny, D.Marie. , Metzker , M. Lee . , Abramzon,S., Adams, C, Alder, J., 
Allen, C, Allen, H., Alsbrooks , S . , Amin, A. , Anguiano,D., 
Anyalebechi, V. , Aoyagi,A., Ayodeji,M., Baca,E., Baden, H., 
Baldwin, D . , Bandaranaike, D . , Barber, M. , Barnstead,M. , Benahmed,F., 
Biswalo,K., Blair, J. , Blankenburg, K . , Blyth, P., Brown, M. , 
Bryant, N., Buhay, C, Burch,P., Burrell,K., Calderon,E., 
Cardenas, V., Carter, K., Cavazos,I., Ceasar,H., Center, A., 
Chacko,J., Chavez, D., Chen,G., Chen,R., Chen,Y., Chen, Z . , Chu,J., 
Cleveland, C, Cockrell,R., Cox,C. , Coyle,M., Cree,A. , D'Souza,L., 
Davila,M.L., Davis, C, Davy-Carroll , L . , DeAnda,C, Dederich,D., 
Delgado,0., Denson, S . , Deramo,C, Ding,Y., Dinh,H., Divya,K., 
Draper, H., Dugan-Rocha, S . , Dunn, A., Durbin,K., Duval, B. , Eaves, K., 
Egan,A., Escotto,M., Eugene, C, Evans, C. A., Falls, T., Fan,G., 
Fernandez, S . , Finley,M. , Flagg,N., Forbes , L. , Foster, M. , Foster, P., 
Fraser,C.M., Gabisi,A., Ganta,R., Garcia, A. , Garner, T . , Garza, M. , 
Gebregeorgis , E. , Geer,K., Gill,R., Grady, M. , Guerra,W., Guevara, W. , 
Gunaratne, P. , Haaland,W., Hamil,C, Hamilton, C, Hamilton, K. , 
Harvey, Y. , Havlak,P., Hawes,A., Henderson, N. , Hernandez, J. , 
Hernandez, R. , Hines,S., Hladun,S.L., Hodgson, A. , Hogues,M., 
Hollins,B., Howells,S., Hulyk,S., Hume, J., Idlebird,D., Jackson, A. , 
Jackson, L., Jacob, L., Jiang, H., Johnson, B., Johnson, R. , Jolivet, A. , 
Karpathy, S., Kelly, S., Kelly, S., Khan,Z., King,L., Kovar,C, 
Kowis,C, Kraft, C.L., Lebow,H., Levan,J., Lewis, L., Li,Z., Liu, J., 
Liu, J., Liu,W., Liu, Y. , London, P., Longacre,S., Lopez, J., 
Lorensuhewa, L. , Loulseged, H . , Lozado,R.J., Lu,X., Ma, J., 
Maheshwari,M. , Mahindartne, M . , Mahmoud,M. , Malloy,K., Mangum,A. , 
Mangum,B., Mapua,P., Martin, K. , Martin, R. , Martinez, E., 
Mawhiney, S., McLeod,M.P., McNeill, T . Z . , Meenen,E., 
Milosavl j evic, A. , Miner, G., Minja,E., Montemayor , J. , Moore, S., 
Morgan, M. , Morris , K . , Morris, S., Munidasa,M., Murphy, M. , Nair,L., 
Nankervis,C. , Neal,D., Newton, N . , Nguyen, N., Norris,S., 
Nwaokelemeh, 0. , 0kwuonu,G., Olarnpunsagoon, A, , Pal,S., Parks, K., 
Pasternak, S . , Paul,H., Perez, A. , Perez, L., Pf annkoch, C . , 
Plopper,F., Poindexter, A. , Popovic,D., Primus, E . , Pu,L.-L., 
Puazo,M., Quiroz,J., Rachlin,E., Reeves, K., Regier,M.A., Reigh,R., 
Reilly,B., Reilly,M., Ren,Y., Reuter,M., Richards, S., Riggs,F., 
Rives, C, Rodkey,T., Rojas,A., Rose,M., Rose,R., Ruiz, S.J. , 
Sanders, W., Savery,G., Scherer,S., Scott, G. , Shatsman, S . , Shen,H., 
Shetty,J., Shvartsbeyn, A. , Sisson,I., Sitter, CD., Smajs,D., 
Sneed,A., Sodergren, E . , Song,X.-Z., Sorelle,R., Sosa,J., 
Steimle,M., Strong, R. , Sutton, A. , Svatek, A. , Tabor, P., Taylor, C. , 
Taylor, T. , Thomas, N., Thomas, S., Tingey,A., Trejos,Z., Usmani,K., 
Valas,R., Vera,V. , Villasana, D . , Waldron,L., Walker, B . , Wang, J. , 
Wang,Q., Wang,S., Warren, J., Warren, R. , Wei,X., White, F., 
Williams, G., Willson,R., Wleczyk,R., Wooden, H., Worley,K., 
Wright, D., Wright, R. , Wu,J., Yakub,S., Yen, J., Yoon,L., Yoon,V., 
Yu,F., Zhang, J., Zhou, J., Zhou,X., Zhao,S., Dunn,D., von 
Niederhausern, A. , Weiss, R., Smith, D.R., Holt, R. A., Smith, H.O., 
Weinstock,G. and Gibbs,R.A. 

TITLE Direct Submission 

JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 326606) 

AUTHORS Worley, K . C . 

TITLE Direct Submission 

JOURNAL Submitted ( 16-FEB-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



3 (bases 1 to 326606) 

Rat Genome Sequencing Consortium. 

Direct Submission 

Submitted (ll-OCT-2002) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On Oct 11, 2002 this sequence version replaced gi: 21739250. 
The sequence in this assembly is a combination of BAC based reads 
and whole genome shotgun sequencing reads assembled using Atlas 
(http://www.hgsc.bcm.tmc.edu/projects/rat/). Each contig described 
in the feature table below represents a scaffold in the Atlas 
assembly (a * contig-scaf f old ■ ) . Within each contig-scaf f old, 
individual sequence contigs are ordered and oriented, and separated 
by sized gaps filled with Ns to the estimated size. The sequence 
may extend beyond the ends of the clone and there may be sequence 
contigs within a contig-scaf fold that consist entirely of whole 
genome shotgun sequence reads. Both end sequences and whole genome 
shotgun sequence only contigs will be indicated in the feature 
table. 

Genome Center 

Center: Baylor College of Medicine 
Center code: BCM 

Web site: http://www.hgsc.bcm.tmc.edu/ 

Contact: hgsc-help@bcm.tmc.edu 
Project Information 

Center project name: GRKD 

Center clone name: CH230-208A12 
Summary Statistics 

Assembly program: Phrap; version 0.990329 

Consensus quality: 242752 bases at least Q40 

Consensus quality: 250821 bases at least Q30 

Consensus quality: 254983 bases at least Q20 

Estimated insert size: 244968; sum-of-contigs estimation 

Quality coverage: 5x in Q20 bases; sum-of-contigs estimation 



NOTE: Estimated insert size may differ from sequence length 

( see http : //www . hgsc . bcm. tmc . edu/docs/Genbank_draf t_data . html ) 
NOTE: This is a 'working draft 1 sequence. It currently 
consists of 25 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 



1 


10356: 


contig 


of 10356 bp in length 


10357 


10456: 


gap of 


unknown 


length 


10457 


15819: 


contig 


of 5363 


bp in length 


15820 


15919: 


gap of 


unknown 


length 


15920 


245368: 


contig 


of 229449 bp in length 


245369 


245468: 


gap of 


unknown 


length 


245469 


272041: 


contig 


of 26573 bp in length 


272042 


272141: 


gap of 


unknown 


length 


272142 


276368: 


contig 


of 4227 


bp in length 


276369 


276468: 


gap of 


unknown 


length 


276469 


282159: 


contig 


of 5691 


bp in length 


282160 


282259: 


gap of 


unknown 


length 





282260 


283432: 


contig 


of 1173 


bp in length 


■A 


283433 


283532: 


gap of 


unknown 


length 


*■ 


283533 


284633: 


contig 


of 1101 


bp in length 


*■ 


284634 


284733: 


gap of 


unknown 


length 


+ 


284734 


285764: 


contig 


of 1031 


bp in length 


A- 


285765 


285864: 


gap of 


unknown 


length 


•A 


285865 


287082: 


contig 


of 1218 


bp in length 


+ 


287083 


287182 : 


gap of 


unknown 


length 


A- 


287183 


288399: 


contig 


of 1217 


bp in length 


* 


288400 


288499: 


gap of 


unknown 


length 


A- 


288500 


289828: 


contig 


of 1329 


bp in length 


A- 


289829 


289928: 


gap of 


unknown 


length 




289929 


291274: 


contig 


of 1346 


bp in length 


* 


291275 


291374: 


gap of 


unknown 


length 


A- 


291375 


293018: 


contig 


of 1644 


bp in length 


A- 


293019 


293118: 


gap of 


unknown 


length 


A- 


293119 


294732 : 


contig 


of 1614 


bp in length 


+ 


294733 


294832: 


gap of 


unknown 


length 


-A 


294833 


296078: 


contig 


of 1246 


bp in length 


*> 


296079 


296178: 


gap of 


unknown 


length 


+ 


296179 


297942: 


contig 


of 1764 


bp in length 


A- 


297943 


298042: 


gap of 


unknown 


length 




298043 


299812: 


contig 


of 1770 


bp in length 




299813 


299912: 


gap of 


unknown 


length 




299913 


301595: 


contig 


of 1683 


bp in length 




301596 


301695: 


gap of 


unknown 


length 




301696 


304787: 


contig 


of 3092 


bp in length 




304788 


304887 : 


gap of 


unknown 


length 


•A 


304888 


306249: 


contig 


of 1362 


bp in length 


A- 


306250 


306349: 


gap of 


unknown 


length 


* 


306350 


307801: 


contig 


of 1452 


bp in length 


A- 


307802 


307901: 


gap of 


unknown 


length 


A- 


307902 


309454 : 


contig 


of 1553 


bp in length 


Ar 


309455 


309554 : 


gap of 


unknown 


length 




309555 


314110: 


contig 


of 4556 


bp in length 




314111 


314210: 


gap of 


unknown 


length 




314211 


326606: 


contig 


of 12396 bp in length. 



FEATURES Location/Qualifiers 

source 1. .326606 

/organism="Rattus norvegicus" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 10116" 
/ cl one= " CH2 30-20 8 Al 2 " 

misc^feature 1. .1742 

/ note="wgs_end_extension 
clone_end: Sp6" 

misc_f eature complement (424 5 . .5082) 
/ note="clone_boundary 
clone_end : Sp6 
site : EcoRI 

end^sequence : RWBKN06TVB" 
misc_feature 10457. .12850 

/ note="wgs_contig" 
misc_feature 15920. .16991 

/note-"wgs_contig" 
misc_feature complement (220129. .221101) 

/note="clone_boundary 



clone_end: T7 
site : EcoRI 

end_sequence : RWBKN06T JB" 
misc^feature 241580. .242749 

/ note="wgs_end_extension 

clone_end:T7" 
miscjeature 243833. .245368 

/ note="wgs_end_extension 

clone_end:T7" 

BASE COUNT 81699 a 50290 c 51837 g 74097 t 68683 others 
ORIGIN 



Query Match 39.9%; Score 615.8; DB 2; Length 326606;, 

Best Local Similarity 89.0%; Pred. No. 1.2e-126; 

Matches 665; Conservative 0; Mismatches 82; Indels 0; Gaps 0; 

Qy 4 6 GGCACAGAATTTAT CTT GT GAGAATTGGTT GGCAACAGAGGCTAT CTT GAATAAGT ACTA- 105 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | M | | | I I I I II I I I I 
Db 242326 GGCAC AGAAT T T AT CT T GT GAAAATTG GCT GGC AT T AGAGAAT AT T T T GAAAAAGTACTA 

242267 



Qy 106 CCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGT 165 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml MM M II I 
Db 2422 66 CCTCTCTGCATTTTATGGGATCGAGTTCATTGTTGGAATGCTTGGCAATTTCACCGTGGT 

242207 



Qy 166 GT T C GGC T AC CT CTT CTGCAT GAAGAACT GGAAC AGCAG CAAT GT C TAT CT TT T TAAC C T 225 

I I I I I I II II I M I II M I I I I I II I II II I I II II I II I I I I I I II I I II I II II 
Db 2422 06 GTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGTAGCAACGTCTATCTCTTCAACCT 

242147 



Qy 22 6 TTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCC7VA 2 85 

I M M M M M I I I I I I I I I I I I I II II I I II II I I I II M II I I I I I I I I I I I 
Db 24214 6 TTCCATCTCTGACCTTGCTTTCCTGTGCACGCTTCCCATGCTGATAAGGAGTTACGCCAC 

242087 



Qy 2 86 TGATAAGGGGACCTATGGAGAT GTTCT CT GTATAAGCAACCGATAT GT GCTT CACACCAA 34 5 

M M I I I I I I I I I I I II I I I I II I I I 1111 I I I I I II II I I I I II 

Db 242 08 6 TGGGAACTGGACCTATGGAGATGTTCTCTGCATAAGCAACCGTTATGTGCTTCATGCCAA 

242027 



Qy 34 6 C C T CT ACAC C AG CAT C CT CT T C CT CAC TT T CAT T AGCAT GGAC C GAT AT CT GCT CAT GAA 4 05 

I M I I I I I I II I I I I I I I I II I I I II I II I I I I I I I I I II M II II I I II I I I I I II I 
Db 242 02 6 C CT CT AC ACC AGC AT CCT T TT C CT C ACT T T CATT AG CAT AGAC C GAT ATC T GCT CAT GAA 

241967 



Qy 4 06 GT AC C CTT T C C GAG7\ACACT T T CT ACAAAAGAAGGAAT T T GC CAT T T TAAT CTCGCTGGC 4 65 

M II I I I I M I I I I I I I I II I I I II I II I I I I II II I I I I II M I II I II M Mill 
Db 241966 GTTCCCTTTCC GAGAACAC AT T CTACAAAAGAAGGAATT T GC CAT T T TAAT CTCCCTGGC 

241907 



Qy 4 66 TGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCC 525 

I M I I I I I I II I I I I I I I II I I I I I I I II II I I I II II I I II I I II I II III 
Db 241906 TGTCTGGGTCTTAGTGACCTTAGAAGTTCTACCTATGCTCACGTTTATCACTTCCACCCC 

241847 



Qy 



526 AAAAGAAGAG GGC AGT AACT GC AT C GACT AT GCAAGT T CT GGAAAC C CT GAACACAAT CT 585 



Db 2 4184 6 AAT AGAAAAGGGC GACAGC T GT GT C GACT AT GC AAGT T C T GGAAAC C CTAAATAC AGT CT 

241787 

Qy 586 CATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTT 645 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill MINIM 
Db 2417 86 CATTTACAGCCTGTGCCTGACTTTGCTGGGCTTCCTCATTCCTCTGTCTGTAATGTGCTT 

241727 

Qy 646 CTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCT 7 05 

I I I I I I I M I I I I I I I I II I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I II 
Db 241726 CTTCTACTACAAAATGGTAGTCTTCCTAAAGAAGAGGAGCCAGCAGCAGGCAACTGTGCT 

241667 

Qy 706 GCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTT 7 65 

I Ml I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 1 I I I 
Db 241666 ATCGCTGAACAAACCTCTGCGCCTGGTGGTCCTGGCAGTGGTGATCTTCTCTGTACTCTT 

241607 

Qy 766 C ACAC C CT AT CAT AT CAT G C GCAATTT 7 92 

INN II I I I I I I I I I I I I I I I I 
Db 241606 T AC AC CTT AC CAT AT CAT GC G CAAT GT 2 4158 0 



RESULT 8 
AF247785 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



CDS 



AF247785 1325 bp mRNA linear PRI 26-MAR-2002 

Homo sapiens P2Y purinoceptor 1 mRNA, complete cds . 

AF247785 

AF247785.1 GI:19716154 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 1325) 

Zhang, W., Li , N . , Wan,T. and Cao,X. 
Human P2Y purinoceptor 1 
Unpublished 

2 (bases 1 to 1325) 

Zhang, W., Li , N. , Wan,T. and Cao,X. 
Direct Submission 

Submitted ( 21-MAR-2000 ) Department of Immunology, Second Military 
Medical University & Shanghai Brilliance Biotechnology Institute, 
800 Xiangyin Rd., Shanghai 200433, P.R. China 

Location/ Qualifiers 

1. .1325 

/organism-"Homo sapiens" 

/mol_type="mRNA n 

/db_xref="taxon: 9606" 

69. .1073 

/ codon_start-l 

/product="P2Y purinoceptor 1" 
/protein_id="AAL95690. 1" 
/db_xref="GI : 19716155" 

/ translation="MLGIMAWNATCKNWLAAEAALEKYYLSIFYGIEFVVGVLGNTIV 
VYGYIFSLKNWNSSNIYLFNLSVSDLAFLCTLPMLIRSYANGNWIYGDVLCISNRYVL 



HANLYTSILFLTFISIDRYLIIKYPFREHLLQKKEFAILISLAIWVLVTLELLPILPL 
INPVITDNGTTCNDFASSGDPNYNLIYSMCLTLLGFLIPLFVMCFFYYKIALFLKQRN 
RQVATALPLEKPLNLVIMAVVIFSVLFTPYHVMRNVRIAS RLGSWKQYQCTQWINSF 
YIVTRPLAFLNSVINPVFYFLLGDHFRDMLMNQLRHNFKSLTSFSRWAHELLLSFREK 



it 



BASE COUNT 
ORIGIN 



359 a 



292 c 



261 g 



413 t 



Query Match 38 . 4%; 

Best Local Similarity 75.3%; 
Matches 764; Conservative 



0; 



Score 592.4; DB 9; Length 1325; 
Pred. No. 1.7e-121; 
; Mismatches 2 46; Indels 4; 



Gaps 



2; 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



39 G C AGAAT GGCACAGAATT T AT CT T GT GAGAATT GGT T GGCAACAGAGGCT AT CTT GAAT A 98 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
76 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 135 

99 AGTACTACCTCTCTGCATTTTATGC7^ATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 
I I II I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II 

136 AGT ACTAC CT T T C C AT TT TTT AT G GGAT T GAGT T C GTT GT GG GAGT C CTT GGAAAT AC C A 195 

159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I III I I I I I I I 
196 T T GT T GT T TAC G GCT ACAT CT T CT CTCT GAAGAACT GGAACAGCAGTAAT AT T TAT CT CT 255 

219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I M I M I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
256 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 315 

279 AT GCCAAT GATAAGGGGACCT AT GGAGAT GTT CT CT GTATAAGCAAC CGATAT GT GCTTC 33 8 

I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
316 AT G C CAAT GGAAACT GGAT AT AT GGAGAC GT GCT CT GC AT AAGCAAC CGATAT GT GCTTC 37 5 

339 AC AC CAAC CT CT ACAC CAGCAT C CT CT T C CT C ACTT T CAT T AGCAT GGAC C GAT AT CT GC 398 



37 6 AT GC CAAC CT CT AT AC CAGCAT T CT CT TT CTC ACT T TT AT CAGCAT AGAT C GAT AC T T GA 4 35 

399 T CAT GAAGTAC C CTT T C C GAGAAC ACT TT CT ACAAAAGAAGGAAT TT GC C AT TT TAAT CT 45 8 

I II Mill I I I I I I I I I I I I II I I I I I MINIM || Mill I I I I I I M I I 
436 TAAT TAAGTAT C CTT T C C GAGAAC AC CT T CT GCAAAAGAAAGAGT T T GCT AT TT TAAT CT 4 95 

459 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
496 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 555 

519 CT GT C C CAAAAGAAGAGG GC AGT AACT GC ATC GACTAT G CAAGT T CT GGAAACC CT GAAC 57 8 

I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
556 CT GTTATAACT GACAAT GGCACCACCT GTAAT GATTTT GCAAGTT CT GGAGACCCCAACT 615 

579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I 
616 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 675 

639 T GTG CTT CT T CT ACT ACAAGAT GGTAGT C T T CT T AAAGAGGAGGAGC C AG C AGCAAGCAA 698 

I M I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I III 

676 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 735 

699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 73 6 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 7 95 

Qy 759 T ACT CT T C AC AC C CT AT CAT AT CAT GC GCAAT T T GAGGAT C GC CT C AC GC CT GGAT AGTT 818 

I M M I I I I I II I I I I I I II I I I III I I I I I I I I I I I I I I I I I I I I I I II 
Db 796 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 855 

Qy 819 G GCCACAAGGATGT ACACAGAAGGCCAT CAAAT CTATATACACACT GACACGGCCT C 875 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

Db 856 GGAAGCAGT AT CAGT GC ACT C AGGT C GT CAT CAACT C CTTT T ACAT T GT GAC AC GGC CT T 915 

Qy 87 6 TGGCCTTTCTGAACAGTGCCATCAATCCCATCTTCTACTTCCTCATGGGAGACCATTACA 935 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I II II I I I I I I I II I II 
Db 916 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 975 

Qy 936 GAGAGAT GCT GATTAGTAAGTT CAGACAATACTTCAAGTCC CTTACAT CCTTCAGGACAT 995 

I II I I I I I I II I I I I I I I I I I II I I I I I I I I I I I II I I I II II I II 
Db 97 6 GGGACAT G CT GAT GAAT CAAC T GAGACACAACT T CAAATC C CT T ACAT C CT T T AGCAGAT 1035 

Qy 996 GAGCT GCT GGAT G CAGGT CT T C ACT CAGC CAAAA- T GAGACACTT GATAAACAG 1048 

I I I I III I I I I I I I I I I I I I I M I I I I I I I I I M 

Db 1036 GGGCTCATGAACTCCTACTTTCATTCAGAGAAAAGTGAGGGGCTTGTGAAACAG 108 9 



RESULT 9 
AX549281 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



. REFERENCE 
AUTHORS 
TITLE 



JOURNAL 

FEATURES 

source 



BASE COUNT 
ORIGIN 



AX549281 1380 bp DNA linear PAT 26-NOV-2002 

Sequence 566 from Patent WO02061087. 

AX549281 

AX549281.1 GI:25813951 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Burmer,G.C, Roush,C.L. and Brown, J. P. 

Antigenic peptides, such as for G protein-coupled receptors 
(GPCRs), antibodies thereto, and systems for identifying such 
antigenic peptides 

Patent: WO 02061087-A 566 08-AUG-2002; 
Lifespan Biosciences, Inc. (US) 

Location/ Qualifiers 

1. .1380 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/db_xref ="taxon : 9606" 
383 a 294 c 274 g 429 t 



Query Match 38.4%; Score 592.4; DB 6 

Best Local Similarity 75.3%; Pred. No. 1.7e-121 
Matches 764; Conservative 0; Mismatches 246 



Length 1380; 

Indels 4 ; Gaps 2 ; 



Qy 

Db 



39 GCAGAAT G GC ACAGAAT T TAT CTT GT GAGAAT T GGTT GGCAACAGAGGCT AT CT T GAAT A 98 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
50 GGAT CAT GG CAT GGAATGCAACTTGCAAAAACTG GCT GGCAGCAGAGGCTGCCCTGGAAA 109 



99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 
I I I I I I I I I I I I I I I I I I I M I I I | | I I I | M | | | | | | | | | | || 

110 AGT ACT ACCTT T C CAT T T T TTAT GG GAT T GAGT T C GTT GT GG GAGT C CT T GGAAAT ACCA 169 

159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

Ml M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I III I I I I I I I 
17 0 TT GT T GTT TAC GGCT AC AT CTT CT CTCT GAAGAACT GGAACAGC AGTAAT AT T TAT CT CT 229 

219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 278 

I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
230 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 289 

27 9 AT GC CAAT GAT AAGGGGAC CT AT G GAGAT GT TCT CT GT AT AAGCAAC C GAT AT GT GCTT C 338 

I I I I I I I I I M Ml I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I 
290 AT GC CAAT GGAAACT GGAT AT AT GGAGAC GT GC TCT GCAT AAGCAAC C GAT AT GT GCT T C 349 

339 ACAC CAAC C T CT ACAC C AGCAT CCTCTTCCT CACT TT CAT T AGCAT GGAC C GAT AT CT GC 398 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I || 
350 AT GC CAAC CT CT AT AC C AG CAT TCT CT TT CT CACT T TTAT C AGCAT AGAT CGATACTT GA 4 09 

399 T CAT GAAGT AC C CT T T C C GAGAAC ACTT T CT ACAAAAGAAGGAAT T T GC CAT T T TAAT CT 458 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I II II 
410 TAAT TAAGT AT C CT T T C C GAGAACAC CT T CT GCAAAAGAAAGAGTT T GCT AT T T TAAT CT 4 69 

459 CGCT GGCT GTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTT CAT CAATT 518 

I I I I I I I I I I I I I I I I I I M I I I I II I I I I II I I I I I I I I I 
470 CCTTGGCCATTTGGGTTTTAGT7^ACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 529 

519 CT GT CCCAAAAGAAGAGGGCAGTAACTGCATC GACTATGCAAGTT CT GGAAAC C CTGAAC 578 

I I I I M M I I I I I I I I I I I I I I I I I I I I I I M I I | | | | | 
530 C T GT T AT AACT GAC AAT G GC AC C AC C T GT AAT GAT T T T GCAAGT TCT GGAGAC C C CAAC T 589 

57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I II I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I 
590 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 64 9 

63 9 T GT GCTT CTT CTAC TAC AAGAT GGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml 

650 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 709 

699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

MM UNI M M M II I I I I I I I I I I I I I I I I I I I I I I I I I 

710 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 769 

759 TACT CTT CACACCCT AT CAT AT CAT G C GCAAT T T GAGGAT C GCCT CAC GCC T GGAT AGT T 818 

I M M I I I I I I I I I I I I I I I I I I III I II I I I I I I I I I I I I I I I I I I I II 
77 0 TGCTTTTTACACCCTATCACGTCATGCGGAATGT GAGGAT CGCTTCACGCCTGGGGAGTT 82 9 

819 G GCCACAAGGAT GTACACAGAAGGCCAT CAAAT CT AT AT ACAC ACT GAC AC GGCCT C 875 

I M I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

830 GGAAG C AGT AT C AGT GC ACT CAGGT C GT CAT CAAC T CC TT T TAC AT T GTGAC AC GGC CTT 8 89 



87 6 T GG C CT T T CT GAACAGT GC CAT CAAT C C CAT CTT CT ACT T C CT C AT G GGAGAC CAT TAC A 935 

I I I I I I I I I I I I I I I I I I I I I I I I II I M II I I II II I I I I I I I II I II 
890 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 94 9 



Qy 936 GAGAGAT GCT GATTAGTAAGTT CAGACAATACTT CAAGT C CCTTACATCCTTCAGGACAT 995 

I I I I I I I I I I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I I I 
Db 950 G GGAC AT GCT GAT GAAT CAACT GAGACACAACT T CAAAT C C C TT AC AT C C T T T AGC AGAT 1009 

Qy 996 GAGCT GCT GGATGCAGGT CTTCACTCAGCCAAAA- T GAGACACTT GATAAACAG 104 8 

I I I I III I I II I I II I I I I I I I I I I I I I I I I I I I 

Db 1010 GGGCT CAT GAACTCCTACTTT CATT CAGAGAAAAGT GAGGGGCTT GT GAAACAG 1063 



RESULT 10 
AF348078 

LOCUS AF348078 1380 bp mRNA linear PRI 03-APR-2001 

DEFINITION Homo sapiens G-protein coupled receptor 91 (GPR91) mRNA, complete 
cds . 

ACCESSION AF34 807 8 

VERSION AF348078.1 GI: 13517982 

KEYWORDS 

SOURCE Homo sapiens (human) 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 (bases 1 to 1380) 

AUTHORS Wittenberger,T. , Schaller , H . C . and Hellebrand, S . 

TITLE An expressed sequence tag (EST) data mining strategy succeeding in 

the discovery of new G-protein coupled receptors 
JOURNAL J. Mol. Biol. 307 (3), 799-813 (2001) 
MEDLINE 21172992 
PUBMED 112737 02 
REFERENCE 2 (bases 1 to 1380) 

AUTHORS Wittenberger,T. , Schaller , C . H . and Hellebrand, S . 
TITLE Direct Submission 

JOURNAL Submitted ( 08-FEB-2001 ) ZMNH, Institut fur 

Entwicklungsneurobiologie, Martinistr. 52, Hamburg 20246, Germany 
FEATURES Location/Qualifiers 
source 1. .1380 

/organism="Homo sapiens" 

/mol_type="mRNA" 

/db_xref="taxon:9606" 

/ chromosome="3" 

/map="3q24-q25. 1" 
gene 1. .1380 

/gene="GPR91" 
CDS 55. .1047 

/gene="GPR91" 

/note="orphan receptor" 

/ codon_start=l 

/product="G-protein coupled receptor 91" 
/protein_id="AAK2 9 0 8 0 . 1 " 
/db_xref="GI: 13517983" 

/translation="MAWNATCKNWL7yVEAALEKYYLSIFYGIEFWGVLGNTIVVYGY 
IFSLKNWNSSNIYLFNLSVSDLAFLCTLPMLIRSYANGNWIYGDVLCISNRYVLHANL 
YTSILFLTFISIDRYLIIKYPFREHLLQKKEFAILISLAIWVLVTLELLPILPLINPV 
ITDNGTTCNDFASSGDPNYNLIYSMCLTLLGFLIPLFVMCFFYYKIALFLKQRNRQVA 
T7VLPLEKPLNLVIMAWIFSVLFTPYHVMRNVRIASRLGSWKQYQCTQWINSFYIVT 
RPLAFLNSVINPVFYFLLGDHFRDMLMNQLRHNFKSLTSFSRWAHELLLSFREK" 

BASE COUNT 383 a 294 c 274 g 429 t 

ORIGIN 



Query Match 38.4%; Score 592.4; DB 9; 

Best Local Similarity 75.3%; Pred. No. 1.7e-121; 
Matches 764; Conservative 0; Mismatches 246; 



Length 138 0; 

Indels 4; Gaps 2; 



Qy 39 GCAGAAT GGCACAGAATTTAT CTT GT GAGAATT GGTTGGCAACAGAGGCTAT CTTGAAT A 98 

II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 50 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 109 

Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 110 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 169 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I 
Db 170 TT GT T GTTT AC GGCTACAT C TT CT CT CT GAAGAACT GGAAC AGCAGT AAT AT T TAT C T C T 229 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 230 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 289 

Qy 279 AT G C CAAT GAT AAGG GGAC CT AT GGAGAT GT T CT CT GT ATAAGCAAC C GAT AT GT GC T T C 338 

I I I I 1 I I I I II III I I I I I I I I II I I I I I I I I I II I I I I I M I I I I I I I I I 
Db 2 90 AT GC CAAT GGAAACT GGAT AT AT G GAGAC GT G CT CT GC AT AAGCAAC C GAT AT GT GCT T C 34 9 

Qy 339 AC AC CAAC CT CT ACACCAG CAT C CT CTT C CT C ACT TTCAT TAG CAT G GAC C GAT AT CT GC 398 

I I I I I I I M II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 350 AT GC CAAC CT CT AT ACCAGCAT T CT CTTT CT C ACT TT T AT CAG C AT AGAT C GAT ACT T GA 4 09 

Qy 399 T CAT GAAGT AC C CT T TCC GAGAAC ACTTT CT ACAAAAGAAGGAATTT GC CAT T T TAAT CT 4 58 

I II I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 410 TAATTAAGT AT C CT TT C C GAGAACAC CT T CT G CAAAAGAAAGAGTTT GCT AT T T TAAT CT 4 69 

Qy 459 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I I I II I I I I I M I I I I II I I I I I I I I I I I I I 
Db 470 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 52 9 

Qy 519 CT GT C C CAAAAGAAGAGGG CAGTAACT GCAT C GAC TAT GCAAGT T CT G GAAAC C CT GAAC 57 8 

I I I I II II I I I I I I I I I I I I I I I I I I I I I I I II I I I II I 
Db 530 CT GT T AT AACT GACAAT GGCAC C AC CT GT AAT GAT TT T GCAAGTT CT GGAGAC C CCAACT 58 9 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I I I I I I I II II II II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 590 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 64 9 

Qy 639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I | | | M UN III 

Db 650 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 7 09 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I II I I I I I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I I I I 

Db 710 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 7 69 



Qy 

Db 



759 TACT CTT C AC AC C CT AT CAT AT CAT GC GCAAT T T GAG GAT C GC C T C AC GC C T GGAT AGT T 818 

I II II I I I I I I I I I I I I I I I I II III I I I I I I I I I I I I I I I I I I II I I I I 
770 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 829 



Qy 819 G GCCACAAGGAT GT ACACAGAAGGCCAT CAAATCTAT ATACACACT GACACGGCCT C 875 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 830 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 88 9 

Qy 87 6 T GGC C T T T CT GAAC AGT G C CAT CAAT C CC AT CTT CT ACT T C CT CAT GGGAGAC CAT T AC A 935 

II I I M I I I I I I I I I I I I I I I I I I II I I I I I I I II II I I I I I I I II I II 

Db 8 90 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 94 9 

Qy 936 GAGAGAT G CT GAT T AGTAAGTT C AGACAAT ACT T CAAGT C C CT TAC AT C CT T CAGGAC AT 995 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 950 GGGACATGCTGATGAAT CAACT GAGACACAACTT CAAAT CCCTTACATCCTTTAGCAGAT 1009 

Qy 996 GAGCT G CT GGAT GC AGGT CT T CACT C AGC CAAAA- T GAGACACT T GAT AAACAG 104 8 

I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1010 GGGCT CAT GAACT C CTACTT T CAT T C AGAGAAAAGT GAG GGG CTT GT GAAAC AG 1063 



RESULT 11 

BC030948 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 



BC030948 1449 bp mRNA linear PRI 13-JUN-2002 

Homo sapiens, G protein-coupled receptor 91, clone MGC: 32514 
IMAGE: 4 594 810, mRNA, complete cds . 
BC030948 

BC03094 8. 1 GI: 21410927 
MGC . 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae ; Homo. 
1 (bases 1 to 1449) 
Strausberg, R. 
Direct Submission 

Submitted ( 03- JUN-2002 ) National Institutes of Health, Mammalian 
Gene Collection (MGC) , Cancer Genomics Office, National Cancer 
Institute, 31 Center Drive, Room 11A03, Bethesda, MD 20892-2590, 
USA 

NIH-MGC Project URL: http://mgc.nci.nih.gov 
Contact: MGC help desk 
Email: cgapbs-r@mail.nih.gov 
Tissue Procurement: CLONTECH 

cDNA Library Preparation: CLONTECH Laboratories, Inc. 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Sequencing Group at the Stanford Human Genome 

Center, Stanford University School of Medicine, Stanford, CA 94305 

Web site: http://www-shgc.stanford.edu 

Contact: (Dickson, Mark) mcd@paxil.stanford.edu 

Dickson, M. , Schmutz, J., Grimwood, J., Rodriquez, A., and Myers, 
R. M. 



FEATURES 

source 



Clone distribution: MGC clone distribution information can be found 
through the I.M.A.G.E. Consortium/LLNL at: http://image.llnl.gov 
Series: IRAL Plate: 41 Row: e Column: 17 

This clone was selected for full length sequencing because it 
passed the following selection criteria: matched mRNA gi : 14780893. 

Location/ Qualifiers 

1. .1449 

/organism="Homo sapiens" 



CDS 



/mol_type="mRNA" 

/db_xref="LocusID: 56670" 

/db_xref="taxon: 9606" 

/ cl one= "MGC : 32 5 1 4 IMAGE :4594810" 

/tissue_type="Kidney" 

/ cl one_l ib= "NI H_MGC_7 5 " 

/lab_host="DH10B" 

/note="Vector : pDNR-LIB" 

100. .1104 

/ codon_start=l 

/product="G protein-coupled receptor 91" 
/protein_id="AAH30948. 1" 
/"db_xref="GI : 21410928" 

/translation="MLGIMAWNATCKNWLAAEAALEKYYLSIFYGIEFWGVLGNTIV 
VYGYIFSLKNWNSSNIYLFNLSVSDLAFLCTLPMLIRSYANGNWI YGDVLCISNRYVL 
HANLYTSILFLTFISIDRYLIIKYPFREHLLQKKEFAILISLAIWVLVTLELLPILPL 
INPVITDNGTTCNDFASSGDPNYNLIYSMCLTLLGFLIPLFVMCFFYYKIALFLKQRN 
RQVATALPLEKPLNLVIMAWIFSVLFTPYHVMRNVRIASRLGSWKQYQCTQWINSF 
YIVTRPLAFLNSVINPVFYFLLGDHFRDMLMNQLRHNFKSLTSFSRWAHELLLSFREK 



BASE COUNT 
ORIGIN 



411 a 



308 c 



287 g 



443 t 



Query Match 38.4%; Score 592.4; DB 9; Length 1449; 

Best Local Similarity 75.3%; Pred. No. 1.7e-121; 

Matches 764; Conservative 0; Mismatches 246; Indels 4; Gaps 



2; 



Qy 



Db 



39 GCAGAATGGCACAGAATTTAT CTT GT GAGAATT GGTT GGCAACAGAGGCTAT CTT GAAT A 98 
II I I I I I I INI I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
107 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 166 



QY 
Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 



99 



167 



159 



AGT ACT AC CT CT CT GC AT TT T AT GCAAT C GAGT T CAT TT T T GGAC T GCT T GGGAAT GT C A 158 
I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M || 

AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 226 



CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I III I I I I I I I 

227 T T GT T GT T T AC GGCT ACATCT T CT CT CT GAAGAACT GGAAC AGCAGT AAT AT T T AT CT CT 



219 



287 



279 



TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 
I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 



AT GC CAAT GAT AAGG GGAC C TAT GGAGAT GTT CT CT GTATAAGCAAC C GAT AT GT GCT T C 

MINIMI II III I I I I I II I II II I I I I I I I I I M I I I II I I I I II I I I 

347 AT GC CAAT GGAAACT GGAT AT AT GGAGAC GT GCT CT GCATAAGCAAC C GAT AT GT GCT T C 



339 



407 



399 



467 



ACAC CAAC CT CT ACAC CAGC AT C CT CTT C CTC AC T T T CAT T AGCAT G GAC CGAT ATC T GC 
I I I I I II I I II I I I I I I I I I I I I I I I I II I I I II I II I I II I I II I II 
AT G C CAAC CT CT ATAC CAGCAT T CTCTT T CT C ACT T TTAT C AGCAT AGATC GAT ACTT GA 



218 



286 



278 



346 



338 



406 



398 



466 



T CAT GAAGTAC C CT T T C C GAGAAC ACT T T CT ACAAAAGAAGGAAT TT GC CAT TT T AAT CT 458 
I M I M II I II I I M I I I I I I I I I I I I I I I II II I II II I I I I I I I I I I I I I 
T AAT T AAGT AT C CT T T C C GAGAACAC CT T CT GCAAAAGAAAGAGT T T GCT AT T T TAAT CT 526 



Qy 



459 



CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 
I I I I I I I I I I I I I I I I I I II I II I I I I I I I II I I I I II I I I 



518 



Db 



527 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 58 6 



Qy 519 CTGTCCCAAAAGAAGAGGGCAGTAACTGCATCGACTATGCAAGTTCTGGAAACCCTGAAC 57 8 

I I I I M II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 587 CT GT TATAACT GACAAT GGC AC CAC CT GTAAT GATT TT GCAAGTT CT GGAGAC C C CAACT 64 6 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I Mill 
Db 647 ACAACCTCATTTACAGCATGTGTCT7VACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 7 06 

Qy 639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCT^AGCAA 698 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I III 

Db 707 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 766 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 

Db 7 67 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 826 

Qy 7 59 TACT CTT C ACAC C CT AT CAT AT CAT GC GCAATT T GAGGAT C GCCT CAC G C CT G GATAGTT 818 

I M M I I I I I I I I I I I I I I I I II III I I I I I I I I I I I I I I I I II I I MM 
Db 827 T GC T T T T T ACAC C CT AT CAC GT C AT GC GGAAT GT GAGGAT C GCT T CAC GCCT GGGGAGTT 886 

Qy 819 G GC CACAAGGAT GT ACACAGAAGGC CAT CAAAT C TAT AT AC AC ACT GACAC GG C CT C 875 

I M I I I II I I I I M I I M II I M II II II II I II II 

Db 8 87 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 94 6 

Qy 87 6 TGGCCTTT CT GAAC AGT GC CAT CAAT C C CAT CT T CT ACT T C CT CAT GG GAGAC CAT T AC A 935 

I M I I I II I I II I M I II II I II I II I I I I I II M II I M M II II I II 
Db 947 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 1006 

Qy 936 GAGAGAT GCT GAT T AGT AAGT T C AGACAAT ACT T CAAGT C C CT T AC AT C CT T CAGGACAT 995 

I II I II M I M I I I I II II I II II II I II II I II M I II I I II I I I 
Db 1007 GGGACAT G CT GAT GAAT CAACT GAGAC AC AACT TCAAAT C C CT T ACAT C CT T TAGCAGAT 1066 

Qy 996 GAGCT GCT GGATGCAGGT CTT CACTCAGCCAAAA- T GAGACACTT GATAAACAG 1048 

I II I II I I I I I I II II I II I II I I I II I II II II 

Db 1067 GGGCT CAT GAACT C CT ACTTT CATTCAGAGAAAAGT GAGGGGCTT GT GAAACAG 1120 



RESULT 12 

AX342665 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



JOURNAL 



PAT 12-JAN-2002 



AX342665 1542 bp DNA linear 

Sequence 20 from Patent WO0198351. 

AX342665 

AX342665. 1 GI : 18152045 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 

Lai, P., Baughn,M. R. , Haf alia, A. J. , Nguyen, D. B., Gandhi,A.R., 
Kallick, D. A. , Griffin, J. A. , Yue, H . , Khan, F. A. , Patterson, C . , 
Lu,D.A. , Tribouley,C.M. , Lu,Y., Walia,N.K., Graul,R., Yao, M. G . , 
Yang, J., Ramkumar , J. , Au- Young, J., Hernandez, R. , Walsh, R.T. and 
Borowsky,M. L. 

Patent: WO 0198351-A 20 27-DEC-2001; 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Incyte Genomics , Inc . (US ) 

Location/Qualifiers 
1. .1542 

/organism="Homo sapiens" 
/mol_jtype=" genomic DNA" 
/db_xref="taxon: 9606" 
/note="Incyte ID No: 3485895CB1* 
428 a 327 c 315 g 472 t 



Query Match 38.4%; 
Best Local Similarity 75.3%; 
Matches 764; Conservative 



Score 592.4; DB 6 
Pred. No. 1.7e-121 
0; Mismatches 24 6 



Length 1542; 

Indels 4; Gaps 2; 



Qy 



Db 



39 GC AGAAT GG C ACAGAAT T TAT CT T GT GAGAAT T GGT T GGCAACAGAG GC T AT CT T GAATA 98 
II MINI I I I I I I I I I I I I I I I II I I I I II I I I I I I II I 
205 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 2 64 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II | | | | | | | | | | || | || 

265 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 324 

159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

Ml II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I 
325 TTGTTGTTTACGGCTACATCTTCTCTCTG7UVGAACTGGAACAGCAGTAATATTTATCTCT 384 

219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 278 

I 1 I I I I I M I I I I I II I II Mill I I I I I I I II I I I I I I I I I I I I I I I II I I 
385 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 4 44 



279 



445 



339 



AT GC CAAT GATAAG GG GACCT AT G GAGAT GT T CT CT GT ATAAGCAACC GATATGT GC TT C 338 

I I I I I I I I I II II I I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I 

ATG C CAAT G GAAACT GGAT AT AT GGAGAC GT GCT CT GCAT AAG CAACC GAT AT GT GCTT C 504 



398 



ACAC CAACCT CTACAC CAGCAT C CT CT T C CT C ACT TT CATT AGC AT GGAC C GAT AT CT GC 

I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I II Mill II 

505 ATGC CAACCT CTAT AC CAGCATT CT CTTT CT CACTTTTAT CAGCAT AGAT CGAT ACT TGA 564 



399 T CAT GAAGT AC C CT T T C C GAGAACACT T T CT ACAAAAGAAGGAAT T T GC C ATT T TAAT C T 
I II I I I M I II I II II II II II I I II I I II II I I I II I II II II I I I II II I 
565 TAAT T AAGT AT C CTTT C C GAGAAC AC CT T CT G CAAAAGAAAGAGTTT GCT ATTT TAAT CT 



459 



625 



519 



685 



579 



CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 
I MM I II II II II I II II I II I I II II I M I II I I II II I 
C CT T GGCC AT T T GGGT T TT AGTAAC CT T AGAGTT ACT ACC CATACT T C C C C TT ATAAAT C 

CTGTCCCAAAAGAAGAGGGCAGTAACTGCATCGACTATGCAAGTTCTGGAAACCCTGAAC 
M M M II I II II I II I I I I I II I II II II I II I II II I 
C T GT T AT AACT GACAAT GGCAC C AC C T GT AAT GAT T T T GCAAGT T CT G GAGAC C C CAACT 



ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 
I I I I I I II I I II II I I I M II II II I II II II II I I II II II I I Mill 
7 45 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 



458 



624 



518 



684 



578 



744 



638 



804 



Qy 



Db 



639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

M II I I I M I I I I II II II I I II II I I M I II I I I II I I III 

8 05 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 8 64 



Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 8 65 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 924 

Qy 759 TACTCTTCACACCCTATCATATCATGCGCAATTTGAGGATCGCCTCACGCCTGGATAGTT 818 

I II II I M I I I I I II I I I I I I I I III I I I I I I I I I I I I I I I I II I I I I II 
Db 925 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 984 

Qy 819 G GCCACAAGGAT GTACACAGAAGGCCAT CAAATCTATATACACACT GACACGGCCTC 87 5 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 985 GGAAGCAGT AT CAGT GCACT CAGGT CGT CAT CAACT CCTTTTACATT GTGACACGGCCTT 1044 

Qy 876 TGGCCTTTCTGAACAGTGCCATC7VATCCCATCTTCTACTTCCTCATGGGAGACCATTACA 935 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II II I II I I I I II I II 
Db 1045 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 1104 

Qy 936 GAGAGATGCT GATTAGTAAGTT CAGACAATACTT CAAGT CCCTTACAT CCTTCAGGACAT 995 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

Db 1105 GGGAC AT G CT GAT GAAT CAACT GAGACACAACTT CAAAT C C C TT ACAT C C T T T AGCAGAT 1164 

Qy 996 GAGCTGCT GGATGCAGGT CTT CACT CAGC CAAAA- T GAGACACTT GATAAACAG 104 8 

I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 1165 GGGCTCAT GAACT CCTACTTT CATTCAGAGAAAAGT GAGGGGCTT GTGAAACAG 1218 



RESULT 13 

AC116026 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC116026 90343 bp DNA linear PRI 09-APR-2002 

Homo sapiens 3 BAC RP11-3F11 (Roswell Park Cancer Institute Human 
BAC Library) complete sequence. 
AC116026 

AC116026.1 GI: 19697319 
HTG. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 90343) 

Muzny, D . M. , Adams, C, Adio-Oduola, B. , Ali-osman, F. R. , Allen, C, 
Alsbrooks, S.L. , Amaratunge, H . C . , Are,J.R., Ayele,M., Banks, T., 
Barbaria,J., Benton, J., Bimage,K., Blankenburg, K. , Bonnin,D., 
Bouck,J., Bowie, S., Brieva,M. , Brown, E., Brown, M. , Bryant, N. P., 
Buhay,C, Burch,P., Burkett,C, Burrell , K . L . , Byrd,N.C, 
Carron,T.F., Carter , M. , Cavazos , S . R. , Chacko,J., Chavez, D., 
Chen,G., Chen,R., Chen,Z., Chowdhry, I . , Christopoulos, C . , 
Cleveland, C. D. , Cox,C, Coyle,M.D., Dathorne, S . R. , David, R. , 
Davila, M. L . , Davis, C. , Davy-Carroll, L . , Dederich, D. A. , 
Delaney, K . R. , Delgado,0., Denn,A.L., Ding,Y., Dinh,H.H., 
Douthwaite, K. J. , Draper, H., Dugan-Rocha, S . , Durbin,K.J., 
Earnhart,C, Edgar, D., Edwards , C . C . , Elhaj,C, Escotto,M. , 
Falls, T., Ferraguto, D. , Flagg,N., Ford, J. , Foster, P., Frantz,P., 
Gabisi,A., Gao,J., Garcia, A., Garner, T., Garza, N. , Gill,R., 
Gorrell, J.H. , Guevara,W., Gunaratne, P . , Hale,S., Hamilton, K., 
Harris, C, Harris, K., Hart,M. , Havlak,P., Hawes,A., He,X., 
Hernandez, J. , Hernandez, O . , Hodgson, A. , Hogues,M., Holloway,C, 
Hollins,B., Homsi,F., Howard, S., Huber,J., Hulyk,S., Hume, J., 



Jackson, L. E. , Jacobson,B., Jia,Y., Johnson, R. , Jolivet,S., 
Joudah,S., Karlsson,E., Kelly, S . , Khan,U., King,L., Korvah,J., 
Kovar,C, Kratovic,J. f Kureshi,A., Landry, N., Leal,B., Lewis, L.C., 
Lewis, L., Li, J., Li,Z., Lichtarge, 0 . , Lieu,C, Liu, J., Liu,W., 
Loulseged, H . , Lozado,R.J., Lu,X., Lucier,A., Lucier,R., Luna,R., 
Ma, J., Maheshwari,M. , Mapua,P., Martin, R. , Martindale, A. , 
Martinez, E . , Massey,E., Mawhiney,E., McLeod,M.P., Meador,M., 
Mei,G., Metzker,M., Miner, G., Miner, Z., Mitchell, T . , Mohabbat,K., 
Moore, S., Morgan, M. , Moorish, T., Morris, S., Moser,M., Neal,D., 
Nelson, D., Newtson,J., Newtson,N., Nguyen, A. , Nguyen, N., Nguyen, N., 
Nickerson, E. , Nwokenkwo, S . , Oguh,M., Okwuonu,G., Oragunye,N., 
Oviedo,R., Pace, A., Payton,B., Peery,J., Perez, L., Peters, L., 
Pickens, R. , Primus,E., Pu,L.L., Quiles,M., Ren,Y., Rives, M. , 
Rojas,A. , Ro jubokan, I . , Rolfe,M. , Ruiz,S., Savery,G., Scherer,S., 
Scott, G. , Shen,H., Shooshtari, N . , Sisson,!., Sodergren, E . , 
Sonaike,T., Sparks,A., Stanley, H . , Stone, H., Sutton, A., Svatek,A., 
Tabor, P., Tamerisa,A. , Tamerisa, K . , Tang,H., Tansey, J., Taylor, C., 
Taylor, T., Telfrod,B., Thomas, N., Thomas, S., Usmani,K., Vasquez,L., 
Vera, V., Villalon,D., Vinson, R. , Wang,Q., Wang,S., Ward-Moore, S . , 
Warren, R. , Washington, C . , Watlington, S . , Williams , G . , 
Williamson, A. , Wleczyk,R., Wooden, S., Worley,K., Wu,C, Wu,Y., 
Wu,Y.F., Zhou, J. , Zorrilla,S., Naylor,S.L., Weinstock,G. and 
Gibbs,R. 

Direct Submission 
Unpublished 

2 (bases 1 to 90343) 
Worley,K.C. 

Direct Submission 

Submitted ( 23-MAR-2 002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

3 (bases 1 to 90343) 
Worley,K.C. 

Direct Submission 

Submitted ( 09-APR-2002 ) Human Genome Sequencing Center, Department 

of Molecular and Human Genetics, Baylor College of Medicine, One 

Baylor Plaza, Houston, TX 77030, USA 

INFORMATION: http://www.hgsc.bcm.tmc.edu/ or email 

gc-help@bcm . tmc . edu 

CLONE LENGTH: This sequence does not necessarily represent the 
entire insert of this clone. Overlapping regions of clones are only 
sequenced and submitted once, so the sequence for the remainder of 
the insert may be found in the record for the adjacent clones. 
Overlapping clones are noted at the beginning and end of the 
Features listing. 

ANNOTATION OF FEATURES: 

STSs are identified using ePCR (Genome Res. 7:541-550) searches 
of a local database that includes entries from dbSTS, GDB, and 
local mapping efforts. 

Repeats are identified using RepeatMasker (A. Smit and P. Green, 
unpublished.) for Human and Mouse sequences. 

Genes and Region of sequence similarity are identified by BLAST 
(Nuc. Acids Res. 25:3389-3402) similarity (expect < le-34) to the 
EST and cDNA sequences. Genes demonstrate at least two exons 
flanked by consensus splice sites that maintained sequence 



continuity across the splice junctions. Sequences that are not 
identical matches are annotated as similar. 



SEQUENCING READ COVERAGE : Sequencing is completed to a minimum 
standard of double strand coverage with a minimum of 2 clones and 
reads with no ambiguities or 2 chemistries with a minimum of 2 
clones and 3 reads with no ambiguities. If the sequence quality fo 
a region does not meet this standard, it will be indicated in the 
annotation as Low Coverage. 

QUALITY OF INDIVIDUAL BASES :This sequence meets stringent quality 
standards - estimated error rate less than 1 per 10,000 bases. 
Reports of lowest quality individual bases and measures of base 
quality are listed below. Description of the metrics can be found 
at URL: 

http : / / gc.bcm. tmc.edu: 8088/quality . inf o/genbank . annotation . html . 



QUALSTAT-REPORT . 
FEATURES Location/Qualifiers 
source 1'. .90343 

/organism="Homo sapiens" 

/mol_type=" genomic DNA" 

/db_xref="taxon:9606" 

/ chr omo s ome= 11 3 " 

/clone= ,, RPll-3Fll" 
repeat_region 991. .1106 

/ rp t_f amil y = "MER4 5 B " 
repeat_region complement ( 1314 . .162 7) 

/ r p t_f ami 1 y= " Al uS x " 
repeat_region complement ( 2 137 . .2430) 

/rpt_family="AluY" 
repeat_region complement (2568 . .2741) 

/ rp t_f ami 1 y= " L 1M4 " 
repeat_region complement (2 742 . .304 7) 

/ rp t_f amily= " AluSx " 
repeat_region complement ( 3048 . .3165) 

/ r p t_f ami 1 y = " L 1M4 " 
repeat_region 4735. .4865 

/ rp t_f ami 1 y = " FLAM_C " 
repeat_region 5657. .5762 

/rpt_family="LlMC/D" 
repeat_region 5906. .6237 

/rpt_family= n LTR21B" 
repeat_region 6289. .6773 

/rpt_family="HERVFH2.1" 
repeat_region complement { 8725 . .9597) 

/ rp t_f ami 1 y = "MER1 ID" 
STS 12399. .12689 

/ s t anda r d_name= "13 604 6" 
repeat_region 13774. .13816 

/ rp t_f ami 1 y= "Al u " 
repeat_region 13817. .13874 

/rpt_family=" (TA) n" 
repeat_region complement ( 15157 . .15633) 

/rpt_family= ,, L2" 
repeat_region 15706. .15747 

/ rp t f ami 1 y= "AT rich " 



repeat__region 16025. .16235 

/ r p t_ f ami 1 y = " MI R " 
repeat_region 16560. .16682 

/rpt_family="L2" 
repeat_region complement ( 16710 . .17265) 

/ r p t_ f ami 1 y = " LT R4 9 " 
repeat_region 18077. .18368 

/rpt_family="AluSx" 
repeat_region complement ( 1837 6 . .18471) 

/rpt_family="L2 " 
repeat_region complement ( 18486 . .18859) 

/rpt_family= M MER57B" 
repeat_region complement (20618 . .20922) 

/rpt_f amily="AluSx" 
repeat_region 21337. .21363 

/rpt_family="AT_rich" 
repeat_region 22155. .22561 

/rpt__family="LlM4" 
repeat_region complement (22608 . . 22659) 

/ r p t_ f ami 1 y= " L 1M4 " 
repeat_region 22685. .23013 

/rpt_f amily="LlMB8 " 
repeat__region 23103. .23399 

/rpt_family="AluSg" 
repeat__region 23500. .23973 

/rpt_family="LlME3A" 
repeat_region complement (24 02 7 . .24305) 

/ r p t_f ami 1 y= " L 1MB 1 " 
repeat_region 24304. .24655 

/rpt_family="LlME3A" 
repeat_region 24656. .24697 

/ rp t_f ami 1 y= "MADE 1 " 
repeat_region 25203. .25518 

/ r p t_f ami 1 y = " Al u Jo " 
repeat_region 25783. .25817 

/rpt_f amily=" (TAA) n" 
repeat_region 26187. .26211 

/ rp tf ami 1 y= " AT_r i ch " 
repeat_region 27014. .27030 

/rpt_f amily="AT_rich" 
repeat_region complement (27031 . .27316) 

/ rp t_f ami 1 y= " Al uSx " 
repeat_region 27317. .27328 

/ rp t_f ami 1 y= " AT_r i ch " 
repeat_region 27574. .27615 

/rpt_f amily=" (TAGA) n" 
STS 28062. .28166 

/ s t anda rd_name= "24707" 
STS 28199. .28382 

/ s t anda rd_name= "1317 0" 
repeat__region complement (2907 9 . .29167) 

/ rp t_f ami 1 y= "MLT 1 J" 
repeat_region 29168. .29532 

/rpt_family= n THElB" 
repeat_region complement (29533 . .29552) 

/ rp t_f ami 1 y= "MLT 1 J " 
repeat__region 29807. .30387 



Query Match 38.3%; Score 590.2; DB 9; 

Best Local Similarity 75.5%; Pred. No. 6e-121; 
Matches 760; Conservative 0; Mismatches 243; 



Length 90343; 
Indels 4; Gaps 



2; 



Qy 4 6 GGCACAGAATTTATCTTGT GAGAATT GGTT GGCAACAGAGGCT AT CTT GAATAAGT ACT A 105 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 80664 GGCATGGAATGCT^ACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAAAGTACTA 80723 

Qy 106 CCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGT 165 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 8 0724 CCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCATTGTTGT 80783 

Qy 166 GT T C G GC T AC CT CTT CT G CAT GAAGAACT GGAAC AGCAGCAAT GT CT AT CTT T TT AAC CT 225 

I I I I I I I I I M I I I I I I I I I I II I I I I I I I I I I III I I 1 I I I I I I I I I I I 

Db 8 0784 TTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCTTTAACCT 80843 

Qy 22 6 TTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAA 285 

II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I I I 

Db 8 0844 CTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTTATGCCAA 80903 

Qy 286 T GAT AAGGGGAC CT ATGGAGAT GT T CT CT GT ATAAGC AAC CGAT AT GT GCTT CAC AC CAA 34 5 

II II III I I ! I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 80904 T G GAAACT GGAT AT AT GGAGAC GT GCT CT GCATAAGCAAC CGATAT GT GCTT CAT GC C AA 80963 

Qy 34 6 C CT CT AC AC C AG C AT CCT CT T C CT CACT T T CAT T AGCAT GGAC C GAT AT CT GCT CAT GAA 405 

I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I II I I I I I II I II II 

Db 80964 CCT CT AT AC CAGCAT TCT CT TT CT CAC T T T TAT CAGCATAGAT C GAT ACT T GAT AAT T AA 81023 

Qy 4 06 GTACCCTTTCCGAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGC 465 

III I I I I I I I I I I I I I I I I I I I MINIM || | | | | | I I I I I I I I I I I I I I I 

Db 81024 GTATCCTTTCCGAGAACACCTTCTGCAAAAGAAAGAGTTTGCTATTTTAATCTCCTTGGC 81083 

Qy 466 TGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCC 525 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II M 
Db 81084 CAT T T GGGT T TT AGT AAC CT T AGAGTT AC T AC C CAT ACT T CC C CT T AT AAAT CCT GT TAT 81143 

Qy 526 AAAAGAAGAGGGCAGTAACTGCATCGACTATGCT^GTTCTGGAAACCCTGAACACAATCT 585 

II II I I I I I I I I I I I I I II I I II I I II II I II II | I I I I I I 

Db 81144 AAC T GACAAT GGCAC C AC CT GTAAT GAT T T T GCAAGT TCT GGAGACC C CAACT ACAAC CT 81203 

Qy 58 6 CATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTT 64 5 

I I II I I I I I I I II II II I M II II Mill I I I I I I I I I I II II i I II II 
Db 812 04 CATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGATGTGTTT 812 63 

Qy 646 CTT CT ACT ACAAGAT G GT AGT CT T CT TAAAGAGGAGGAG C CAGCAGCAAGCAACT GC CCT 705 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 81264 CTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTACTGCTCT 81323 

Qy 7 06 GCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTT 765 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I 

Db 81324 GCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTGTGCTTTT 81383 

Qy 7 66 C ACAC C C TAT CAT AT CAT GC GC AATT T GAGGAT C GCCT CAC G C CT GGAT AGT T G GCC 822 

I I I I I I I I I I I I I I I I I I III II II II I I II I I I I I I I II I I I I I I II 
Db 81384 TACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTTGGAAGCA 814 43 



Qy 823 ACAAGGAT GT AC AC AGAAGGC CAT CAAAT C TAT AT AC AC ACT GACAC GGC CT C T G GC CT T 882 

I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | I I I I I I I I 

Db 81444 GTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTTTGGCCTT 81503 

Qy 8 83 T CTGAACAGT GCCATCAATCCCAT CTT CTACTTCCT CAT GGGAGACCATTACAGAGAGAT 942 

I I I I I I I I I I I I I I I I I II I I I I I I I II II I I I I I I I II I III II II 
Db 81504 T CT GAACAGT GT CAT CAACC CT GT CTT C T ATT TT CTT T T GGGAGAT CACT T CAGGGACAT 81563 

Qy 943 GCT GAT T AGTAAGT T C AGACAAT ACTT CAAGT CC CT T AC AT C C T T C AGGAC AT GAGCT G C 1002 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II || | Ml III 
Db 81564 GCT GAT GAAT CAAC T GAGACACAACTT CAAAT C C CTT ACAT C CTT T AGC AGAT GGGCT C A 81623 

Qy 1003 T GGAT GC AGGT CT T CACT CAG C CAAAA- T GAGAC ACT T GAT AAACAG 104 8 

Ml I I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 81624 T GAACT C CT ACTT T CAT T C AGAGAAAAGT GAGGGGCT T GT GAAAC AG 81670 



RESULT 14 

AC068647 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC068647 132745 bp DNA linear PRI 24-JUL-2002 

Homo sapiens 3 BAC RP11-64D22 (Roswell Park Cancer Institute Human 
BAC Library) complete sequence. 
AC068647 

AC068647.10 GI:19774263 
HTG. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 132745) 

Muzny, D . M . , Adams, C, Adio-Oduola, B . , Ali-osman, F. R ., Allege, 
Alsbrooks,S.L. , Amaratunge , H . C . , Are,J.R., Ayele,M., Banks, T., 
Barbaria,J., Benton, J., Bimage,K., Blankenburg, K. , Bonnin,D., 
Bouck,J., Bowie, S., Brieva,M., Brown, E. , Brown, M. , Bryant, N. P., 
Buhay,C, Burch,P., Burkett,C, Burrell , K. L. , Byrd,N.C, 
Carron,T.F., Carter, M. , Cavazos , S . R. , Chacko,J., Chavez, D., 
Chen,G., Chen,R., Chen,Z., Chowdhry, I . , Christopoulos , C . , 
Cleveland, C. D. , Cox,C, Coyle,M.D., Dathorne, S . R. , David, R., 
Davila,M.L. , Davis, C. r Davy-Carroll, L . , Dederich, D . A. , 
Delaney, K . R. , Delgado,0., Denn,A.L., Ding,Y., Dinh,H.H., 
Douthwaite, K. J. , Draper, H. , Dugan-Rocha , S . , Durbin, K.J. , 
Earnhart,C, Edgar, D., Edwards, C . C . , Elhaj,C, Escotto,M. , 
Falls, T., Ferraguto, D. , Flagg,N., Ford, J. , Foster, P., Frantz,P., 
Gabisi,A. , Gao,J., Garcia, A. , Garner, T., Garza, N., Gill,R., 
Gorrell, J.H. , Guevara, W., Gunaratne, P . , Hale,S., Hamilton, K. , 
Harris, C, Harris, K. , Hart,M., Havlak,P., Hawes,A., He,X., 
Hernandez, J. , Hernandez , O . , Hodgson, A. , Hogues,M., Holloway,C, 
Hollins,B., Homsi,F., Howard, S., Huber,J., Hulyk,S., Hume, J. , 
Jackson, L. E. , Jacobson,B., Jia,Y., Johnson, R., Jolivet,S., 
Joudah,S., Karlsson,E., Kelly, S., Khan,U., King,L., Korvah,J., 
Kovar,C, Kratovic,J., Kureshi,A., Landry,N., Leal,B., Lewis, L.C., 
Lewis, L., Li, J., Li,Z., Lichtarge, O . , Lieu,C, Liu, J., Liu,W., 
Loulseged,H. , Lozado,R.J., Lu,X., Lucier,A., Lucier,R., Luna,R., 
Ma, J., Maheshwari,M. , Mapua,P., Martin, R. , Martindale, A. , 
Martinez, E., Massey,E., Mawhiney,E., McLeod,M.P., Meador,M., 
Mei,G., Metzker,M., Miner, G. , Miner, Z., Mitchell, T., Mohabbat,K., 
Moore, S., Morgan, M. , Moorish, T., Morris,S., Moser,M., Neal,D., 



Nelson, D., Newtson,J., Newtson,N., Nguyen, A. , Nguyen, N., Nguyen, N. 
Nickerson, E. , Nwokenkwo, S . , Oguh,M. , Okwuonu,G., Oragunye,N., 
Oviedo,R., Pace, A. , Payton,B., Peery,J., Perez,L., Peters, L., 
Pickens, R., Primus, E., Pu,L.L., Quiles,M., Ren,Y., Rives, M., 
Rojas,A., Ro jubokan, I . , Rolfe,M., Ruiz,S., Savery,G., Scherer,S., 
Scott, G., Shen,H., Shooshtari , N . , Sisson,I., Sodergren, E . , 
Sonaike,T., Sparks, A. , Stanley, H., Stone,H., Sutton, A., Svatek,A., 
Tabor, P., Tamerisa, A. , Tamerisa, K., Tang,H., Tansey,J., Taylor, C, 
Taylor, T., Telfrod,B., Thomas, N., Thomas, S., Usmani,K., Vasquez,L. 
Vera, V., Villalon,D., Vinson, R. , Wang,Q., Wang,S., Ward-Moore, S • , 
Warren, R. , Washington, C . , Watlington, S . , Williams, G. , 
Williamson, A. , Wleczyk,R., Wooden, S., Worley,K., Wu,C, Wu,Y., 
Wu,Y.F., Zhou, J. , Zorrilla,S., Naylor,S.L., Weinstock,G. and 
Gibbs,R. 

Direct Submission 
Unpublished 

2 (bases 1 to 132745) 
Worley,K.C. 

Direct Submission 

Submitted ( 06-MAY-2000 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

3 (bases 1 to 132745) 
Worley,K.C. 

Direct Submission 

Submitted (26-MAR-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

4 {bases 1 to 132745) 
Worley,K.C. 

Direct Submission 

Submitted (28-MAR-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

5 (bases 1 to 132745) 
Worley,K.C. 

Direct Submission 

Submitted (29-MAR-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

6 (bases 1 to 132745) 
Worley,K.C. 

Direct Submission 

Submitted (25- JUN-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

7 (bases 1 to 132745) 
Worley,K.C. 

Direct Submission 

Submitted (24- JUL-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On Mar 28, 2002 this sequence version replaced gi: 19718616. 
INFORMATION: http://www.hgsc.bcm.tmc.edu/ or email 
gc-help@bcm. tmc . edu 



CLONE LENGTH: This sequence does not necessarily represent the 



entire insert of this clone. Overlapping regions of clones are only 
sequenced and submitted once, so the sequence for the remainder of 
the insert may be found in the record for the adjacent clones. 
Overlapping clones are noted at the beginning and end of the 
Features listing. 



ANNOTATION OF FEATURES: 

STSs are identified using ePCR (Genome Res. 7:541-550) searches 
of a local database that includes entries from dbSTS, GDB, and 
local mapping efforts. 

Repeats are identified using RepeatMasker (A. Smit and P. Green, 
unpublished.) for Human and Mouse sequences. 

Genes and Region of sequence similarity are identified by BLAST 
(Nuc. Acids Res. 25:338 9-34 02) similarity (expect < le-34) to the 
EST and cDNA sequences. Genes demonstrate at least two exons 
flanked by consensus splice sites that maintained sequence 
continuity across the splice junctions. Sequences that are not 
identical matches are annotated as similar. 



SEQUENCING READ COVERAGE : Sequencing is completed to a minimum 
standard of double strand coverage with a minimum of 2 clones and 2 
reads with no ambiguities or 2 chemistries with a minimum of 2 
clones and 3 reads with no ambiguities. If the sequence quality for 
a region does not meet this standard, it will be indicated in the 
annotation as Low Coverage. 

QUALITY OF INDIVIDUAL BASES: This sequence meets stringent quality 
standards - estimated error rate less than 1 per 10,000 bases. 
Reports of lowest quality individual bases and measures of base 
quality are listed below. Description of the metrics can be found 
at URL: 

http : //gc . bcm. tmc . edu : 8 08 8 /quality . info/ genbank . annotation. html . 



QUALSTAT- 



FEATURES 

source 



misc_f eature 
STS 

repeat^region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 
repeat_region 



REPORT. 

Location/Qualifiers 
1. .132745 

/organism="Homo sapiens" 
/mo l_type=" genomic DNA" 
/db_xref="taxon: 9606" 
/ chromosome="3" 
/clone="RPll-64D22" 
1. .2005 

/note="overlaps bases 170209. 
/ f unction="clone overlap" 
30. .130 

/ s t anda r d_name= "74493" 
complement ( 522 . .1015) 
/ r p t_f ami 1 y= "MLT ID" 
complement (2452. .2697) 
/ r p t_f ami 1 y = " L 1MA5A" 
complement (3200. . 3578) 
/ r p t_f ami 1 y= "MLT IB" 
3600. .3749 
/rpt_f amily=" (TA) n" 
4391. .4411 
/rpt_family="AT_rich" 
4909. .4960 



.172213 of clone AC069067" 



repeat 


region 


repeat 


region 


STS 




repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 


repeat 


region 



/ rp t_f amily= " AT_r i ch " 
complement ( 5657 . .6403) 
/ r p t_ f ami 1 y = " L 1 PA1 3 " 
6404. .7799 
/rpt_family="LlPA13" 
8045. .8318 

/ s t andar d_name= "183647" 
complement (8708. .9282) 
/ r p t_ f ami 1 y = " L 1MD 3 " 
complement (9287 . . 9357) 
/ r p t_ f ami 1 y= "MLT 1 F 1 " 
complement (9359. . 9460) 
/ r p t_ f ami 1 y = " L IMC 3 " 
complement (9587. . 9880) 
/rpt_family= M AluSg" 
complement ( 10203 . . 10450) 
/ r p t_f ami ly="L IMC 4 " 
11468. .11699 
/rpt_family="LlM4" 
11717. .11886 
/ r p t_ f ami 1 y = " MLT 2 C B " 
11908. .11982 

/rpt_family="Tigger3 (Golem) " 

12020. .12246 

/rpt_family="THElC" 

12263. . 12562 

/rpt_family="AluSx" 

13326. .13346 

/rpt_family="AT_rich" 

complement (13934 . . 14245) 

/rpt_family="AluSx" 

complement (14256. . 14568) 

/rpt_family="AluJo" 

14618. .14746 

/rpt_family="MER8" 

14825. .14849 

/ rp t_f ami 1 y= " AT_r i ch " 

14865. .14906 

/ rp t_f ami 1 y= " AT_r i ch " 

15465. .15739 

/rpt_family="L2" 

16579. .16756 

/rpt_family=" (TTATA) n" 

complement (16757 . . 17074) 

/rpt__family="L2" 

17621. .17660 

/rpt_family=" (CAAAA) n" 

18544. .18725 



Query Match 38.3%; Score 590.2; DB 9; Length 132745; 

Best Local Similarity 75.5%; Pred. No. 6.1e-121; 

Matches 7 60; Conservative 0; Mismatches 243; Indels 4; Gaps 2; 



Qy 4 6 GGCACAGAAT T TAT C T T GT GAGAAT T GGT T GGCAAC AGAG G CTATCT T GAAT AAGT ACTA 105 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 123065 GGCAT GGAAT GCAACTT GCAAAAACT GGCT GGCAGCAGAGGCT GCCCT GGAAAAGT ACTA 

123124 



Qy 

Db 

123184 

Qy 

Db 

123244 

Qy 

Db 

123304 

Qy 

Db 

123364 

Qy 

Db 

123424 

Qy 

Db 

123484 

Qy 

Db 

123544 

Qy 

Db 

123604 

Qy 

Db 

123664 

Qy 

Db 

123724 

Qy 

Db 

123784 

Qy 



106 CCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGT 165 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | 1 I I I I I 

12312 5 CCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCATTGTTGT 



166 GTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCT 22 5 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IN I I I I I I I I I I I I I I 
12318 5 TTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCTTTAACCT 



22 6 TTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAA 285 
II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I 
12324 5 CTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTTATGCCAA 



286 T GATAAGGGGAC CTATGGAGAT GTT CT CT GTATAAGCAACCGAT ATGT GCTT CACAC CAA 345 
II II III I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
123305 T G GAAACT GGAT ATAT GGAGAC GT GCT CT G C AT AAGCAAC CGAT AT GT GCTT CAT GC CAA 



34 6 C CT CT ACAC C AGCAT CCT CT T C CT C ACT T T CAT T AGC AT GGAC C GAT AT CT GCT CAT GAA 4 05 
I I I I I I I I I I I I I I I I I I I I I I I I I I I II Mill II I I I I I II I II II 
123365 CCT CT AT AC C AG CAT TCTCTTTCT C ACT T T TAT C AGCAT AGAT C GAT ACT T GAT AAT T AA 



406 GT AC C CT T T C C GAGAACACT T T CT ACAAAAGAAGGAAT T T GC CAT T TT AAT C T C GCT GGC 4 65 
III I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
123425 GTATCCTTTCCGAGAACACCTTCTGCAAAAGAAAGAGTTTGCTATTTTAATCTCCTTGGC 



466 TGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCC 525 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1234 85 CATT T G G GT TT T AGTAAC CT T AGAGT TACT AC C C AT ACT T CC C CTT AT AAAT CCT GT TAT 



52 6 AAAAGAAGAGGGC AGT AACT G CAT C GAC T AT GCAAGTT CT GGAAACC CT GAACACAAT CT 585 
II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
12354 5 AACT GACAAT GGC AC C AC CT GTAAT GATT TT G CAAGTT CT GGAGAC C C CAAC T ACAAC C T 



58 6 CAT T TACAGC CT CT GC CT GACTT T GT TGGGCTTCCTAATTCCTCTCTCTGT GAT GT GCTT 645 
I I I I I I I I I I I II II II II I I I I I I I I I I I I I I I I I I I I I I M I I I I II 
123605 CATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGATGTGTTT 



64 6 CTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCT 7 05 
I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | I I I I I I I I I 

123665 CTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTACTGCTCT 



7 06 GCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTT 7 65 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

123725 GCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTGTGCTTTT 



7 66 CACACC CT AT CAT AT CAT GC G CAATT T GAG GAT C G CCT C AC GC CT GGAT AGT T G GCC 822 



I I I I I I I I I I I I I I I I I I III MINIUM I I I I I M I M I I I II II 

Db 123785 T ACAC C C TAT CAC GT C AT GC GGAAT GTGAG GAT CG CT T C AC GC C T GGGGAGT T GGAAGCA 

123844 

Qy 823 ACAAGGAT GT AC AC AGAAG GC CAT CAAAT CT ATAT ACAC ACT GACAC GGC C T CT GG C CT T 8 82 

I I I I I M I I I I I I I I II I I I I I I I I I II I II I I I I M I I I 

Db 12384 5 GTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTTTGGCCTT 

123904 

Qy 8 83 T CT GAAC AGT GC CAT CAAT C C CAT CTT CT ACTT C CT C ATGGGAGAC C ATT AC AGAGAGAT 942 

I I I I I I I I I I I II I I I I II I I I I I I I II II I I I I I I I II I Ml II II 
Db 123905 T CT GAACAGT GT CAT CAAC CCT GT CTT CTATTTTCTTTT GGGAGAT CACTT CAGGGACAT 

123964 

Qy 94 3 GCT GATTAGTAAGTT CAGACAATACTT CAAGTCCCTT ACAT CCTTCAGGACATGAGCTGC 1002 

I I I I I I I I I I I I I I I I I I I II I II I II I I I I I I I II M I II I III 
Db 123965 GCTGAT GAAT CAACT GAGACACAACTT CAAATCCCTTACAT CCTTTAGCAGAT GGGCTCA 

124024 

Qy 1003 T G GAT G CAGGT CTT C ACT C AGC CAAAA- T GAGACACTT GAT AAACAG 1048 

III I I II I M I I I I I I I I I I I I I I INI 

Db 124025 TGAACTCCTACTTTCATTCAGAGAAAAGTGAGGGGCTTGTGAAACAG 124071 



RESULT 15 

AR035943 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

FEATURES 

source 

BASE COUNT 
ORIGIN 



AR035943 
Sequence 
AR035943 
AR035943. 



1 from patent 



GI:5952611 



1996 bp 
US 5871963. 



DNA 



linear 



PAT 29-SEP-1999 



Unknown . 

Unknown . 

Unclassified. 

1 (bases 1 to 1996) 

Conley,P.B. and Jantzen, H . -M. 

P2u2 purinergic receptor and nucleic acid encoding the receptor 
Patent: US 5871963-A 1 16-FEB-1999; 

Location/ Qualifiers 

1. .1996 

/organism- "unknown" 
513 a 455 c 381 g 647 t 



Query Match 38.2%; Score 589.2; DB 6 

Best Local Similarity 75.1%; Pred. No. 8.8e-121 
Matches 7 62; Conservative 0; Mismatches 2 48 



Length 1996; 

Indels 4; Gaps 2; 



Qy 



Db 



39 GC AGAAT GGCACAGAAT TT AT CTT GT GAGAATT GGT T GGCAACAGAGGCT AT CTT GAAT A 9 8 
II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
632 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 691 



Qy 



Db 



99 AGTACT AC CT CT CT G C AT TT T AT G CAAT C GAGT T CAT TT T T GGACT GCT TGGGAAT GT CA 158 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

692 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 7 51 



Qy 



159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 



Db 



III II I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I III I I I I I I I 

752 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCT 811 



Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 812 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 871 

Qy 279 AT GC CAAT GAT AAGGGGAC CT AT GGAGAT GTT CT CT GT AT AAGCAAC CGATAT GT GCT T C 338 

I I I I I I I I I M Ml I I I I I I I I II I I I I I I I I I I I I U I I I I I I I I I I I I I 
Db 872 AT GC CAAT GGAAAC T GGAT ATAT GGAGAC GTGCT C T GCAT AAGCAAC C GAT AT GT G CTT C 931 

Qy 339 AC AC CAAC CT CT AC AC C AGC AT CC T CT T C CT CACT T T CAT T AGC AT GGAC C GAT AT C T GC 398 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II 
Db 932 AT GC CAAC CT CT AT AC C AG CAT T CT CT TT CTCACT T TT AT C AGC AT AGAT C GAT ACTT GA 9 91 

Qy 399 T CAT GAAGT AC C CTT T C C GAGAACACT T T C T ACAAAAGAAGGAAT T T GC CAT T TTAAT CT 4 58 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 992 TAAT T AAGT AT C CTT T C C GAGAAC ACCT T CT GCAAAAGAAAGAGT T T GCT AT T TTAAT CT 1051 

Qy 459 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 1)1111 
Db 1052 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 1111 

Qy 519 CT GT C C CAAAAGAAGAGGG C AGTAACT GCAT C GAC TAT GCAAGT T CT GGAAAC CCT GAAC 578 

I I I I II II I I I I I I I I I I M I I I I I I I I M I I I I I I I I I 
Db 1112 CT GT T AT AACT GACAAT GG CAC CAC CT GTAAT GAT TTT GCAAGT T CT GGAGAC CC CAAC T 1171 

Qy 57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I If I I I I I I I I I I I I I I 
Db 1172 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 1231 

Qy 639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

I I I I I I M I I I I I I I I I I I I I I I II I I I I I I I I I I I I M III 

Db 1232 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 12 91 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

Db 1292 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 1351 

Qy 759 TACT CTT CACAC C CT AT CAT AT C ATGC GCAAT T T GAG GAT C GC CT C AC GC CT GGAT AGT T 818 

I II II I I I I I I I I i I I 11111:1 III I I I I I I I I I I I I II I I I I I I I I I I 
Db 1352 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 1411 

Qy 819 G GC C ACAAGGAT GT ACACAGAAG GC CAT CAAAT CT AT AT AC AC ACT GAC AC GGC CT C 875 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 

Db 1412 GGAAGCAGT AT CAGT G CACT CAGGT C GT CAT CAAC TC CTT TTACAT T GT GAC AC GG GCT T 1471 

Qy 87 6 T GGC C T T T CT GAACAGT GC C AT CAAT C C CAT CT T C T ACT T C CT CAT GGGAGACC AT T ACA 935 

III I I I I I I I I I I I I I I I I I I I I II M I I I I I II II I I I I I I I II I II 
Db 1472 TGGGCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 1531 

Qy 936 GAGAGAT GCT GAT T AGTAAGT T CAGACAAT ACT T CAAGT C C CT T ACAT CCT T CAGGACAT 995 

I I I I I I I I I I I I I I MINI I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 1532 GGGAC AT GCT GAT GAAT C AACT GAGAC ACAACT T CAAAT C C CT T AC AT C CT TT AGCAGAT 1591 

Qy 996 GAGCT GCTG GAT GC AGGT CT T CACT CAGC CAAAA- T GAGACACT T GAT AAAC AG 1048 

I I I I III I I I I I I I I I I I I I I I I I I I I I I I M I I 



Db 



1592 GGGCT CAT GAAC T C CT ACT T T CAT T CAGAGAAAAGT GAGG G GCT T GT GAAACAG 1645 



Search completed: December 14, 2003, 15:00:20 
Job time : 5884 sees 



GenCore version 5-1.6 
Copyright (c) 1993 - 2003 Compugen Ltd. 



OM nucleic nucleic search, using sw model 
Run on: 



December 14, 2003, 13:14:29 ; Search time 3520 Seconds 

(without alignments) 
10653.922 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



US-09-891-138A-1 
1543 

1 gctcctggcagagttttctg . . 

IDENTITYJJUC 

Gapop 10.0 , Gapext 1.0 



. tgcctaaataaatcaatata 1543 



Searched: 22781392 seqs , 12152238056 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

PoPTz-prC'Cessing : Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



45562784 



EST: 



3 
4 
5 
6 
7 
. 3 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 



em _es tba : * 
em_esthum : * 
em_es tin 
em_astmu : -* 
era_estov : * 
em_estpl : * 
em_estro : * 
em_htc : * 
gb_estl : * 
gb_est2 : * 
gb htc : * 
gb_est3 : * 
gb_ est4 : * 
gb__est5 : * 
em_estfun : * 
em_estom : * 
em_gss_hum : * 
em_gss__inv : * 
em_gss_pln : * 
em_gss_vrt : * 
em_gss_f un : * 
em_gss_mam: * 
em_gss_mus : * 
em_gss_pro : * 
em_gss_rod : * 
em_gss_phg : * 
em_gss_vrl : * 



28: gb_gssl:* 
29: gb_gss2:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 



tfo . 
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10 
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3 


516.4 


33 


. 5 
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4 95.8 


32 
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52 0 


9 
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4 55 


29 


.5 


4 69 


10 


BB744 515 


RR744515 BB744515 




4 3 8 


28 


.4 


45 8 


10 


RR746222 
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7 


414 


26 


.8 


42 8 


10 
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4 03 . 8 


26 


.2 


422 


10 


RRS4791 ft 
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RRR4791ft PRR4791 ft 


9 


388.4 


25 


.2 


42 0 


10 


RB864882 


RRR64ftft9 RRR64RR9 


10 


384.8 


24 


. 9 


42 6 


10 


RR778587 


BB77 8 5 87 RR77R5R7 


11 


3 80 . 4 


24 


.7 


3 96 


10 


D -D / — > J?*± O 


RR7394R2 RR7394R2 


12 


3 63 . 8 


23 


.6 


3 67 


9 
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.r-r.x U ^ _y — j x U.J\-/j / i_. x v . -A. 
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3 5 7.6 


2 3 


.2 


63 6 


10 
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UXJ Ut: Ji6 / " 
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3 5 4 . 2 


23 


. 0 


416 


10 


RR ft 4 66 0 8 

J_> JJ U T U W U U 
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15 


3 50.6 
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4 08 
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. 1 


3 9 7 


1 0 
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2 96 


19 
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37 7 
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TariA 09099 

jut: u Z U J? 
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2 94 . 6 


19 
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o Z J 
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BY005778 BY00577B 


31 


215 . 4 


14 


. 0 


282 


10 


BB215653 


BB215653 BB215653. 


32 


214 


13 


. 9 


312 


10 


BB498898 


BB498898 BB498898 


3 3 


202 


13 
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. 9 
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10 
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. 5 
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36 
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. 3 
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.8 


1026 


29 


GNS051MY 


AL3 17059 Tetraodon 
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.3 
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. 1 
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~» 

O 


13 9 . 3 


9 , 
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ALIGNMENTS 



RESULT 1 
AK080866 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
. MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi ; Muridae; Murinae,- Mus . 



AK080866 1585 bp mRNA linear HTC 05-DEC-2002 

Mus musculus 4 days neonate male adipose cDNA, RIKEN full-length / 
enriched library, clone : B430012O21 product : G-PROTEIN COUPLED 
RECEPTOR GPR91 , full insert sequence. 
AK080866 

AK080866 . 1 GI : 2609952 7 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodent ia; 
1 

Carninci, P. and Hayashizaki , Y . 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303/ 19-44 (1999) 

99279253 

10349636 

2 

Carninci, P. # Shibata, Y., Hayat.su,N., Sugahara,Y., Shibata", K: 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu, M. and Hayashizaki , Y . 
Normalization and subtraction of cap- trapper -selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new genes 



Genome Res. 

20499374 

11042159 

3 

Shibata, KJ , 



10 (10):, 1617-1630 (2000) 



Itoh,M., Aizawa,K., 
Koimo,H. , Akiyama,J\, Nishi,K., 
Sumi,N., Ishii,Y., Nakamura, S. , 



Nagacka, S . , Sasaki , N. , Carninci , P . , 
Kitsunai,T. , Tashiro,H. , Itoh,M. , 
Hazama,M. , Nishine,T., Harada,A., 
Yamamoto,R., Matsumoto, H . , Sakaguchi , S . , Ikegami,T., Kashiwagi , K. , 
Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J., 
Okazaki,Y., Muramatsu, M . , Inoue,Y., Kira,A. and Hayashizaki , Y . 
RIKEN integrated sequence analysis (RISA) system- -384-format 
sequencing pipeline with 3 84 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
-20530913 
11076861 

A . 

Xawai,J.; Shinagawa, A. , Shibata ,K., Yoshino,M., Itoh,M. 
Arakawa,T., Hara,A. , Fukunishi , Y . , Konno,H., Adachi , J. , 
Aizawa,K., Izawa,M., Nishi,K., Kiyosawa,H., Kondo,S., Yamanaka,I., 
Saito,T., Okazaki,Y., Gojobori,T., Bono,H., Kasukawa,T., Saito,R., 
Kadota,K., Matsuda,H., Ashburner , M. , Batalov,S., Casavant,T., 
Fleischmann,W. , Gaasterland, T . , Gissi,C, King,B., Kochiwa,H., 
Kuehl,P., Lewis, S.. Matsuo, Y. , Nikaido , I . ", Pesole,G., 
Quackenbush, J. , Schriml , L . M . , Staubli,F., Suzuki, R. , Tomita,M., 
Wagner, L. , Washio,T., Sakai,K., Okido, T. , Furuno, M. , Aono,H., 
Baldarelli , R. , Barsh,G., Blake, J., Boffelli,D., Bojunga,N., 
Carninci, P., de Bonaldo, M . F . , Brownstein, M . J . , Bult,C, 



Ishii ,Y. , 
Fukuda , S . , 



TITLE 
JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



Fletcher, C, Fujita,M., Gariboldi , M . , Gustincich, S . , Hill,D., 
Hofmann,M. , Hume, D. A., Kamiya,M., Lee,N.H., Lyons, P., 
Marchionni , L . , Mashima,J., Mazzarelli , J. , Mombaerts , P . , Nordone,P., 
Ring,B., Ringwald,M., Rodriguez , I . , Sakamoto , N . , Sasaki, H., 
Sato,K., Schonbach, C. , Seya,T., Shibata,Y., Storch,K.F., Suzuki, H., 
Toyc-oka,K., Wang,K.H., Weitz,C, Whittaker , C . , Wilming,L., 
Wynshaw-Boris, A. , Yoshida,K., Hasegawa,Y., Kawaji,H., Kohtsuki,S. 
and Hayashizaki , Y . 

Functional annotation of a full-length mouse c.DNA collection 

Nature 409 (6821), 685-690 (2001) 

21085660 

11217851 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I- & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 1585) 

Adachi,J., Aizawa,K., Akimura, T . , Arakawa,T., Bono,H., Carninci,P., 
Fukuda,S., Furuno,M., Hanagaki,T., Hara,A. , Hashizume , W . , 
Hayashida, K. , Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., 
Hori,F., Imotani,K., Ishii , Y . , Itoh, M . , Kagawa,I., Kasukawa,T., 
Katoh,H., Kawai,J., Kojima,Y., Kondo,S., Konno,H., Kouda,M., 
Koya,S., Kurihara, C. , Matsuyama, T. , Miyazaki,A., Murata,M., 
Nakamura,M., Nishi,K., Nomura, K., Numazaki,R., Ohno,.M. , Ohsato,N., 
Okazaki, Y. , Saito,R. , Saitoh, H. , Sakai,C, Sakai,K., Sakazume ,N. , 
3ano,H., Sasaki , D., .Shibata, K. , Shinagawa, -A. , 3hiraki,T., ■ 
Soga.be, Y. , Tagami,M. , Tagawa,A. , Takahashi,F. , Takaku-Akahira, S . , 
Takeda,Y., Tanaka, T . Tomaru, A. , Toya,T., Yasunishi , A. , 
Muramatsu,M. and Hayashizaki , Y . 

Direct Submission • 
Submitted (16-APR-2002) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN), Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome -re s@gsc . riken. go . jp, 
URL :http :/ /genome . gsc . riken . go . jp/, Tel : 81-45-503-9222 , 
Fax: 81-45-503-9216) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site for further details. 
URL :http : //genome . gsc . riken. go . jp/ 
URL : http : / / f antom . gsc . riken . go . jp/ . 

Location/ Qualifiers 

1. .1585 

/organism- "Mus musculus" 

/ mol_type= "mRNA" 

/strain= n C57BL/6J n 

/ db_xre f = " FANT0M_DB :B430012O21" 

/ db_xr e f = " t axon : 1 0 0 9 0 " 

/clone- "B4 3 001202 1" 

/sex= "male" 

/ tissue_type= n adipose " 



/clone_lib="RIKEN full-length enriched mouse cDNA library 
/dev_stage= "4 days neonate" 
misc_f eature 69. .1025 

/note="G-PROTEIN COUPLED RECEPTOR GPR91 (SPTR | Q99MT6 , 
evidence: FASTY, 94.3%ID, 100%length, match=954) 
putative" 
polyA_ signal 1558. .1563 

/note="putative" 
polyA__site 1585 

/ note= "putative " 
BASE COUNT 450 a 351 c 305 g 477 t 2 others 

ORIGIN 

Query Match 96.2%; Score 1484.8/ DB 11; Length 1585; 

Best Local Similarity 98.4%; Pred. No. 1.5e-296; 

Matches 1521; Conservative 0; Mismatches 22; Indels 3; Gaps 2 
Qy 1 GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAATTTATC 60 

IMIIIMIIIIIIIIMIIIIIMIIIIIIIIIMIIIIMIMMIIIIIIIIMIII 

Db .26 GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAATTTATC 35 

Qy 61 TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 12 0 

li I II MM I II II III Ml II I Mill III I II II I II Ml MMM II II I 1 1 II 1 1 

Db 36 TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGGATTTTA 145 

Qy TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTC.TT 180" 

Y 1 1 i i 1 1 ! 1 1 M ! 1 1 II I III ! i I i i 1 i I i 1 1 i I ! 1 1 ! 1 1 1 i MUM II I Ml I II 

Db 14 o TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGGGGTGTTTGGGTACCTGTT 2 05: 

Qy 1 1 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 24 0 - 

IIMIIIMMIIlMllllMIIMMMIIIilllllllMlllllillilMllilM 

Db 20 C CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 2 65.. 

Qy- 2-11 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 300 

- 1 1 1 M M M I 1 1 1 II II I I! 1 1 II II M M 1 1 M 1 1 1 II II II M I M I II M 1 1 1 M 

Db 2 66 TGCTTTCCTGGGCACCCTTCCCATCCTGATAAAGAGTTTTGCCAATGATAAGGGGACCTA 32 5' 

Qy 3 01 TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGCAT 360. 

Ill Ml III I II II II Mill Mill MM MM MM Mill I II I III 1 1 II 

Db 3 26 TGGAGATGTTCTTTGGATAAGCAACCGATATGGGCTTAACACCAACCTTTAAACCAGCAT 3 85 

Qy 3 61 CCTCTTCCTCAOTTTCATTAGCATGGACCGATATCTGCTCATGAA- GTACCCTTTCCGA 418 

I I Mi ill 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II I II II 

Db 3Mf; CTTTTTCTTCATTTTCATTAGCATGGACCGATATCTGCTCATGAAAGTACCCTTTTCCGA 44 5. 

Qy 4 1 9 GAACAC - TTTCT ACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTT 4 77 

111111 hi MINN IMM III M IMIIII 1 1 II III MM I MIMMilh I 

Db 44 6 GAACAC TTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTT 505. 

Qy 478 AGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCCAAAAGAAGAGGG 537 

1 1 1 1 III M M II 1 1 1 II! II 1 1 II I II 1 1 II I II I 1 1 M II II II 1 1 1 II M I M 1 1 1 1 

Db ' 50c AGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATTCTGTCCCAAAAGAAGAGGG 565 

Qy 53 8 CAGTAACTGCATCGACTATGCAAGTTCTGGAAACCCTGAACACAATCTCATTTACAGCCT 597 

1 1 1 1 1 1 1 e 1 1 1 r r 1 1 1 ! f 1 1 1 1 1 1 ; 1 1 1 1 1 e i i 1 1 1 mmmmimmimmmimi 

Db :"66 CAGTAACTGCATCGACTATGCAAGTTCTGGAAACCCTGAACACAATCTCATTTACAGCCT 62 5 



Qy 5 98 CTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAA 65 7 

' 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I M 1 1 1 1 1 1 1 1 ! 1 1 1 1 II 1 1 1 1 1 1 1 M I M 1 1 1 1 1 1 1 

Db 62 6 CTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAA 685 

Qy 658 GATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAA 717 

IMI MINIM 1 1 MINIM MINIM MM III II III INN I II INN Mill 

Db 636 GATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAA 74 5 

Qy 718 ACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCA 77 7 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 

Db 74 6 ACCCCAACGCCTGGTGGTCCTGGCAGTTGTGATCTTCTCTATACTCTTCACACCCTATCA 805 

Qy 77 S TATCATGCGCAATTTGAGGATCGCCTCACGCCTGGATAGTTGGCCACAAGGATGTACACA 83 7 

IMIMIIIIIIMIIIIIMIIIIIMIIIIIIMIIIIIIIIIIIMIIIIMIiMI 

Db 0O6 TATCATGCGCAATTTGAGGATCGCCTCACGCCTGGATAGTTGGCCACAAGGATGTACACA 86 5 

Qy 83 8 GAAGGCCATCAAATCTATATACACACTGACACGGCCTCTGGCCTTTCTGAACAGTGCCAT 897 

1 1 1 1 1 1 1 1 1 M 1 1 M I II I 1 1 1 1 1 1 1 1 1 1 1 1 1 M I II I II 1 1 1 1 II 1 1 1 1 I Ml 1 1 M I i 

Db . 866 GAAGGCCATCAAATCTATATACACACTGACACGGCCTCTGGCCTTTCTGAACAGTGCCAT 925 

Qv 098 CAATCCCATCTTCTACTTCCTCATGGGAGACCATTACAGAGAGATGCTGATTAGTAAGTT 957 

MIMIMMIMMMMMMMNNNNIMMMNMMMNINNIMM 

Db ?2 6 CAATCCCATCTTCTACTTCCTCATGGGAGACCATTACAGAGAGATGCTGATTAGT.AAGTT 98-5 

Qy 95 6 CAGACAATACTTCAAGTCCCTTACATCCTTCAGGACATGAGCTGCTGGATGCAGGTCTTC 1 0 1 7 

IMIIIMIIMIMININIMIMINMIIIIMIMIMIMIIIIIilMNN 

Db , • CAGACAATACTTCAAGTCCCTTACATCCTTCAGGACATGAGCTGCTGGATGCAGGTCTTC 1045 

Qv 10 J. B ACTCAGGCAAAATGAGACACTTGATAAACAGTGCTGTGCAGTTGAGTTTTAACTAAGTAA 1077* 

M MMMMMMMM-MUM MMMI MMMM MMMI MMM 

Db 104 6 ACTCAGCCA AAATGAGACACTTGATAAACAGTGCTGTGCAGTTGAGTTTTAACTA?^GTAA 1105 

C'V "U)7C ACOACCATTTC T AGG CTTTAGCTTT C CACCATC C TC C AACCCC CAGGGCTGGAGTAC AAG 113 7- 

illllMMIIIIIirMIIIIIIIIIMIIIIIIIIMIIIIMIMIiriililllM 

Db 1 10 6 ACCACQATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAG 1 1 65 ' 

Qy 113 8 CTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAGGTTATACCCAGAGTA 1197 

I ^ I 1 1 I ! 1 1 1 1 I 1 1 ; i I i I ! 1 1 1 1 1 1 1 1 1 1 : 1 : I i ! 1 1 h ! I I : 1 1 1 1 ! : I I ! I ! I 

Db 1166 CTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAGGTTATACCCAGAGTA 1225 

Qy 1198 TGGAAAAAATAAGGCATGAGAAAGCATTGACATCTTCACTTAAGAACTGAACAAAAGAGA 1 2 57 . 

1 1 , 1 1 - I i : ^ I ' ; 

Db 12 2 6 TGGAAAAAATAAGGCATGAGAAAGCATTGACATCTTCACTTAAGAACTGAACAAAAGAGA 12 85 

Qy 125 8 ACAAATATTGTCAATGTTTGGACACTTAGGATCTGAAATCTTGGAAATTTTAAGACCTCT 13-17 

' Mi-M M-MM MMMUMMM MMMM MMMI MMM 

Db 12 36 ACAAATATTGTCAATGTTTGGACACTTAGGATCTGAAATCTTGGAAATTTTAAGACCTCT 1345 

Qy 1318 TTTTCTATCAGTGTAAAAGGAATACAAGATAGCTAGTTGCAAATGCTGAATGCATTTCAT 13 77 

M MMMI UMMM IMMM MMMl IMMMMMMIMMMMM 

Db ' 1346 TTTTCTATCAGTGTAAAAGGAATACAAGATAGCTAGTTGCAAATGCTGAATGCATTTCAT 14 05 

Qy 13 7 8 CATTGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTGTAATATTA 14 3 7 

1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 ll 1 1 II 1 1 1 1 1 1 1 1 1 M 

Db 14 06 GATTGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTGTAATATTA 1465 



Qy 



143 8 AAATTTATGTGAAAAATGAATAT AATTCAATGTACAACATTAG ATTTTCT ATTTG AAAAT 14 97. 



I I I I I I I I I I I I I ^ I I ' I I - [ I I I I I I I i I I I I : I I I I I I < I I 1 I I [ I I [! I I I :! I I I I 

Db 14 66 AAATTTATGTGAAAAATGAATATAATTC AATGTACAACATTAG ATTTTCTATTTGAAAAT 152 5 

Qy 14 98 TATATTTCTTGAAAAAATAACTGCTGTGCCTAAATAAATCAATATA 154 3 

M I I I I I I U I I I I I I I II I I I II I I I I I I I I | I I | M | | | | | | | I 
Db 152 6 TATATTTCTTGAAAAAATAACTGCTGTGCCTAAATAAATCAATATA 15 71 



RESULT 2 
BB323771 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



BB323771 683 bp mRNA linear EST 31-AUG-2001 

BB323771 RIKEN full-length enriched, 4 days neonate male adipose 
Mus musculus cDNA clone B430012O21 3', mRNA sequence. 
. BB323771 
BB323771.2 GI:15411432 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 683) 

Arakawa,T., Carninci,P., Fukuda,S., Furuno,M. , Hanagaki,T., Hara,A. 
., Hiramoto,K., Hori,F., Ishii,Y., Ito,M., Kawai,J., Konno,H. , Kouda, 
, M., Koya , S . , Matsuyama , T . ■, Miyazaki,A., Nomura, K., Ohno,M., 
Okazaki,Y., Okido,T., Saito,R., Sakai,C, Sakai,K., Sano,H., Sasaki 
., D . , Shibata, K. , Shinagawa , A. , Shiraki,T., Sogabe,Y., Suzuki,H., 
Tagami,M., Tagawa, A. , Takahashi , F . , Takeda,Y., Tanaka,T. , ToyavT. , 
Muramatsu,M. and Hayashizaki , Y . 
RIKEN Mouse ESTs (Arakawa,T. , et al . 2001) 
i Unpublished 

On Jul 11, 2000 this sequence version replaced gi:9032085. 
Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 23 0-004 5, Japan 

Tel: 81-45-503-9222 

£"ax: 81-45-503-9216 

Email: genome-res@gsc.riken.go.jp, 
URL : http : //genome . gsc . riken . go . jp/ 

Carninci,P., Shibata, Y. , Hayatsu,N. , Sugahara,Y., Shibata,K., Itoh 
, M., Konno , H . , Okazaki,Y., Muramatsu,M. and Hayashizaki , Y . 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fu j iwake , S . , Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
^atahiki,M., Yoneda,Y.,. Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura -. 
,S., Kawai,J., Okazaki,Y., Muramatsu, M . , Inoue , Y . , Kira,A. and 
Hayashizaki , Y . 

RIKEN integrated sequence analysis (RISA) system- -3 84 -format 
sequencing pipeline with 3 84 multicapillary sequencer. Genome Res. . 
10 (11), 1757-1771 (2000) 

Konno, H. , Fukuni shi , Y . , Shibata,K., Itoh/M., Carninci,P., Sugahara 
,Y. and Hayashizaki , Y . 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 



Yamanaka , I . , Kiyosawa,H., Kondo,3., Saito,T., Shinagawa,A. , Aizawa 
, K., Fukuda,S., Hara,A. , Itoh,M., Kawai,J., Shibata,K., Arakawa,T., 
Ishii,Y. and Hayashizaki , Y . 

Mapping of 19032 mouse cDNAs on mouse chromosomes. J. Struct. 
Func. Genomics 2 pre, L72-L86 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp/) for 
further details. 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 
FEATURES Location/Qualifiers 
source 1. .683 

/6rganism="Mus musculus" 

/mol_type^ "mRMA" 

/db_xref ="taxon: 10090" 

/clone="B43 0012O21 M 

/sex="male" 

/tissue_type=" adipose" 
/dev_stage="4 days neonate" 
/ -1 abjio s t = " DHl OB " 

/clone_lib= "RIKEN full -length enriched, 4 days neonate 
male adipose" < ■ 

/note="Site__X: Sail; Site_2 : BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 ' 

GAGAGAGAGAAGGATCCAAGAGCTCTTTTTTTTTTTTTTTTVlf "3 * ] r cDNA was 
prepared by using trehalose thermo-act ivated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. cDNA went through one round of normalization 
to Rot = 10.0 and subtraction to Rot = 22 9.0. Second 
strand cDNA was prepared with the primer adapter of 
sequence [5 1 GAGAGAGAGATTCTCGAGTTAATTAAATTAATCCCCCCCCCCCCC 
3'] - cDNA was cleaved with Xhol and BamHI. Vector: a 
modified pBluescript KS (+) after bulk excision from Lambda 
FLC I . " 

■BASE COUNT 226 a 125 c 117 g 215 t 

ORIGIN 

Query Match 36.3%; Score 560; DB 10; Length 683; 

* Best Local Similarity 98.7%; Pred. No. 2.5e-105; 

Matches 596; Conservative 0; Mismatches 5; Indels 3; Gaps: 3 ; 



Qy 


943 


GCTGATTAGTAAGTTCAGAC - AATACTTCAAG - TCCCTTACATCCTTC - AGGACATGAGC 

II IMiMIIIIII II 1 II MINI III IIIIIIIIIIIIIII II III II II 

GCTGATTAGTAAGTTCAGCCAAATACTTCAAGTTCCCTTACATCCTTCAAGGACATAAGT 


999 




66 


125' 


Qy 


1000 


TGCTGGATGCAGGTCTTCACTCAGCCAAAATGAGACACTTGATAAACAGTGCTGTGCAGT 

Ml Ml MINI II II IMIMIIIIII Mill III III lllllllllll III III III 

TGCTGGATGCAGGTTTTCACTCAGCCAAAATGAGACACTTGATAAACAGTGCTGTGCAGT 


1059 


Dh 


126 


185 



Qy, 



1.0 6 0 TGAGTTTTAACTAAGTAAACCACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCC 1119 



MINIUM IMMIMMMMMMIMMMMMIMMMMMIMMIMI 

Db 18 5 TGAGTTTTAATTAAGTAAACCACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCC 245 

Qy 112 0 CCAGGGCTGGAGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTT 1179 

IMMIIIIMIIIIIIMIIIIMIIIMIIIIMIIIIIMMMIIIIIMIIIIII 

Db 2 46 CCAGGGCTGGAGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTT 3 05 

Qy 1130 TAGGTTATACCCAGAGTATGGAAAAAATAAGGCATGAGAAAGCATTGACATCTTCACTTA 123 9 

I III 1 1 1 1 1 ! f I IIIMIII IIIIIIIMIIIIIIIIIIIMM III II llllll Mill 

Db 3 06 TAGGTTATACCCAGAGTATGGAAAAAATAAGGCATGAGAAAGCATTGACATCTTCACTTA 3 65 

Qy 124 0 AGAACTGAACAAAAGAGAACAAATATTGTCAATGTTTGGACACTTAGGATCTGAAATCTT 12 99 

MM MMMM IIIMIII MIIMIMIIIIIIIIIIIIMMII II llllll Mill 

Db 3 6 6. AGAACTGAACAAAAGAGAACAAATATTGTCAATGTTTGGACACTTAGGATCTGAAATCTT 4 2 5 

Qy 13 00 GGAAATTTTAAGACCTCTTTTTCTATCAGTGTAAAAGGAATACAAGATAGCTAGTTGCAA 13 5 9 

MM MMMM IIIMIII MIMIMMMMM IIIMIII III II III Ml Mill 

Db 42 6 GGAAATTTTAAGACCTCTTTTTCTATCAGTGTAAAAGGAATACAAGATAGCTAGTTGCAA 4 85 

Qy 13 60 ATGCTGAATGCATTTCATCATTGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATT 1419 

MINIMUM IIIMIII M MMMM MMMM MMMM IIIMIII Mill 

Db 4 86 ATGCTGAATGCATTTCATCATTGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATT 54 5 

Qy 14 2 0 TTTATTCTTGTAATATTAAAATTTATGTGAAAAATGAATATAATTCAATGTACAACATTA 14 79 

- I II 1 1 II I M 1 1 1 1 II II I N I II II 1 1 1 M 1 1 II I II II 1 1 II 1 1 1 1 1 M I II I II N I 

Db 54 6 TTTATTCTTGTAATATTAAAATTTATGTGAAAAATGAATATAATTCAATGTACAACATTA 60 5 

QY . 14 00 GATTTTCTATTTGAAAATTATATTTCTTGAAAAAATAACTGCTGTGCCTAAATAAATCAA 1539 

M I II 1 1 M 1 1 1 1 1 II 1 1 IT 1 1 Mi II I M M I II 1 1 1 II I II I II Ml III M I MM I 

To 606 GATTTTCTATTTGAAAATTATATTTCTTGAAAAAATAACTGCTGTGCCTAAATAAATCAA S65 

;?V' - 1540 TATA 1543 . /• ■ . 

; MM 

Db 666 TATA 669 • 



RESULT 3 
BX527630 

ID BX527630 standard; RNA : EST ; 556 BP. 

XX ^ . 

AC BX527630; 

XX 

SV BX527630 . 1 
XX 

DT 2 7 -MAY- 2 003 (Rel. 75, Created) 

DT 27-MAY-2003 (Rel. 75, Last updated, Version 1) 
XX 

DE RZPD Mus musculus cDNA clone IMAGp998B194840 = IMAGE : 197 02 2 6 5' EST. 
XX 

KW EST; expressed sequence tag. 
XX 

OS Mus musculus (house mouse) 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; Mammalia; 

OC Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

XX 

RN [1] 

RP 1-556 



RA Heil O., Ebert L., Neubert P., Peters M. # Radelof U. , Schneider D., 

RA Korn B . ; 

RT 

RL Submitted ( 2 8 -MAY- 2 003 ) to the EMBL/ GenBank/DDBJ databases. 

RL RZPD Deutsches Ressourcenzentrum fuer Genomf orschung GmbH Im Neuenheimer 

RL Feld 580, D-69120 Heidelberg, Germany 

XX 

CC RZPD; IMAGp998B194840 . 

CC RZPDLIB; I.M.A.G.E. cDNA Clone Collection; 

CC Mouse UnigeneSet - RZPD2 (RZPDLIB No. 981) 

CC http : / /www. rzpd. de/CloneCards/cgi -bin/ showLib .pi . cgi/ response?libNo=981 

CC Contact: Ina Rolfs 

CC RZPD Deutsches Ressourcenzentrum fuer Genomf orschung GmbH 

CC Heubnerweg 6, D-14059 Berlin, Germany 

CC Tel: +49 30 32639 101 

CC Fax: +49 30 32639 111 

CC www . rzpd . de 

CC This clone is available royalty-free from RZPD; 

CC contact RZPD (clone@rzpd.de) for further information. 

CC Seq primer: siigF, Primer sequence.: CTTCTGCTCTAAAAGCTGCG 

XX 

FH Key Location/Qualifiers 
FH 

FT source 1. .556 

FT /dbjxref="taxon: 10090" 

FT /note=" 1st strand cDNA was primed with an oligo (dT) primer 

FT [ATGTGGCCTTTTTTTTTTTT.TTTTTJ ; double - stranded cDNA was 

FT ligated to a Drain adaptor [TGTTGGCCTACTGG] , digested and - 

FT cloned into distinct Drain sites of the pME13S-FL3 vector 

FT (5' site CACTGTGTG, 3' site CACCATGTG) . Xhol should be used 

FT " fcd isolate the cDNA insert. Size selection was performed to 

FT exclude fragments <1.5kb. Library constructed by Dr. Sumio 

FT Sugano (University of Tokyo Institute of Medical Science). 

FT Custom primers for sequencing: 5' end primer 

FT CTTCTGCTCTAAAAGCTGCG and 3' end primer 

FT CGACCTGCAGCTCGAGCACA . REFERENCES: Suzuki, Y., Yoshitomo, 

FT , K. , Maruyama, K. , Suyama, A., and Sugano, S. Construction 

FT and characterization of a full length-enriched and a 5 ' end 

FT enriched cDNA library. Gene 200, 149-156, 1997. Sasaki, Z., 

FT Suzuki, Y., Watanabe, M . , Imai, J., Shibui, A.-, Yoshida, 

FT K. , Hat a. H., Yamaguchi , R. , Tateyama, S., and Sugano, S. 

FT Construction of mouse full length-enriched cDNA libraries 

FT by oligo -capping. DNA Research, submitted. " 

FT /organism^ "Mus thusculus" 

FT /clone="IMAGp998B194840" 

FT /clone_lib= "Sugano mouse kidney mkia" 

FT /dev_stage= "adult " 

FT ' - /lab_host= n DH10B" 

XX 

SQ Sequence 556 BP; 134 A; 137 C;. Ill G; 173 T; 1 other; 

Query Hatch. 33.5%; Score 516.4; DB 4; Length 556; 

Best Local Similarity 99.8%; Pred. No. 2.5e-96; . - - 

Matches 517; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

Qy 1 GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAATTTATC 60 

ill INN! Ml III III III Mill 1 1 MM Mill III III III I II MINI II 



Db 



3 9 GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAATTTATC 9 8 



Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 
Qy 

Db 



6 1 TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 12 0 

IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMMIilll 

99 TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 15 8 
12 1 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTGTT 180 

IMIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIilllllllllMI 1 1 1 1 1 ^ i ! 1 1 1 

15 9 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTTGGCTACCTCTT 218 
181 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 24 0 

I 1 1 i II M I ! 1 1 . 1 1 1 M I 1 1 i I 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 < 1 1 1 1 1 1 1 1 1 I [ 1 1 1 i 1 1 1 1 1 

219 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 2 7 8 
241 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 3 0 0 

INI Mill I II Mill IN III Mill I MINIM! MINIUM I MINI Mill 

2 7 9 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 3 3 8 

3 01 TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGCAT 3 60 

IMIMIMI II llrlllllllllllll MINIMI II Mill MINIM III II IN 

33 9 TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGCAT 3 98 



351 CCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATGAAGTACCCTTTCCGAGA 42 0 

I 1 1 I ! I M I I I I I! M I I I I M M I I II I I ; I : I M I I I ! I I I I I I I I i II M h I r I 

3 99 CCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATGAAGTACCCTTTCCGAGA 4 58 



Qy 
Db 
Qy 
Db 



42 1 - ACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGGTGGCTGTCTGGGCCTTAGT 4 3X1 

v 1 1 1 1 1 1 1 i I 1 1 1 1 1 1 1 1 1 I 1 1 1 i I ! 1 1 1 MM 1 1 1 1 ! I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! I ! 1 1 M i 

4 5 9 ACACTTTCTACAAA^GAxAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 5 1 9 
a 81 GACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

MMMMMMMMMMMMMMMMMMM- . 

519 , GACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 556.. 



RESULT 4 
AI663 305 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



AI663305 520 bp mRNA linear EST 10-MAY-1999 

uk27cl0.yl Sugano mouse kidney mkia Mus musculus cDNA clone. 

IMAGE: 1970226 5" similar to SW:P2YR_RAT P49651 P2Y PURINOCEPTOR 1 

;, mRNA sequence. 
AI663305 

AI663305.1 GI:4766888 

EST. ' , 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodent ia; 
1 (bases 1 to 520) 

Marra,M., Hillier,L., Kucaba,T., Martin, J., Beck,C, Wylie,T., 

Underwood, K. , Steptoe,M., Theising,B., Allen, M., Bowers, Y. , Person 

,B., Swaller,T., Gibbons, M., Pape,D., Harvey, N., Schurk,R., Ritter 

, E . , . Kohn, S . , Shin, J., Jackson, Y., Cardenas, M., McCann,R., 

Waterston,R. and Wilson, R. 

The WashU-NCI Mouse EST Project 1999 

Unpublished 

Other ESTs : uk27cl0.xl 



Craniata; Vertebrata; Suteleostomi ; 
Sciurognathi ; Muridae; Murinae; Mus, 



FEATURES 

source, 



BASE COUNT 
ORIGIN 



Contact: Marra M/WashU-NCI Mouse EST Project 1999 
Washington University School of Medicine 

4444 Forest Park Parkway, Box 3501, St. Louis, MO 63108, USA 
Tel: 314 286 1800 
Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 986966 

Seq primer: custom primer used 
High quality sequence stop: 4 90. 

Location/Qualif iers 

1. .520 

/organism= "Mus musculus" 
/mol_ type= "mRNA" 
/strain="C57BL" 
/db_xref="taxon: 10090" 
/clone="IMAGE : 1970226" 
/sex=" female" 
/dev_stage= "adult" 
/lab_host="DH10B" 

/clone_lib="Sugano mouse kidney mkia" 

/no'te= "Organ : kidney; Vector: "pME18S-FL3; Site_l: Drain 
(CACTGTGTG) ; Site_2 : Drain (CACCATGTG) ; 1st strand cDNA 
was primed with an oligo(dT) primer 

[ATGTGGCCTTTTTTTTTTTTTTTTT] ; double - stranded cDNA was 
ligated to a 'Drain' adaptor [TGTTGGCCTACTGG] ,- digested 
and cloned into distinct Dralll sites of the pME18S-FL3 

;'. Vector (5 1 site CACTGTGTG ,- 3' site CACCATGTG) . Xhol should 

be used to isolate the cDNA insert. Size selection was 
performed to exclude fragments <1.5kb. Library 
constructed by Dr. Sumio Sugano (University of Tokyo 
Institute of Medical Science) . Custom, primers for 
sequencing: 5' end primer CTTCTGCTCTAAAAGCTGCG and 3' end 
primer. CGACCTGCAGCTCGAGCACA . " 

127 a 126 c 107 g 160 t 



Query Match 32.1%; Score 495.8; DB 9; Length 520; 

Best Local Similarity 98.6%; Pred. No. 4.6e-92; 

Matches 500; Conservative 0; Mismatches 7; Indels 0; 



Gaps 



0; 



1 GCTCCTGGCAGAGTTTTCTGTCGAGAQAGAAGCCGACAGCAGAATGGCACAGAATTTATC 60 



!i M I 'I I I I I i I I I : I I MM I II i I : I l! I M ill 



II III Mill 



14 GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCTGAATGGCACAGAATTTATC 7 3 



QY 
Db 

Qy 

Db 

Qy 

Db 



6 1 T TGTGAGAATTGGl'TGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 12 0 

IIIIIIIIIIMIMIIMIIIIIIIIIIIIilllMMMMIIIIIIIIIMIIIIII 

74 TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 13 3' 
X 2 1 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 18 0 

1 1 1 1 II 1 1 I II 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml II 1 1 II I II I 1 1 IIIIIIMII! 

134 TGCAATCGAGTTCATTTTTGGACTGCtTGGGAATGTCACTGTGGTGTTTGGCTACCTCTT 193 
181 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 2 4 0_ 

MMMMh!!' IMlll::ll! hM illil IIIMIMIM IMMII, 

194 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 2 5 3 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 



2 41 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 3 0 0 

IIIIIIMMIIIMIIIIIMIIIIIIIIIMIIIIIIMIIMIIIIMIIMIIIII 

2.54 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 313 

3 01 TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGCAT 3 60 

lllllll MINI I llllll Mill IIIIIIIIMIIIIII III III Mill Mill III 

314 TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGCAT 373 
3 61 CCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATGAAGTACCCTTTCCGAGA 42 0 

Mill llllll llllll MIIMIIII IhMI MINIM Mill lllllll 

3 74 GCTCTTGCTCACTGTCATTATCATGGACCGATATCTGCTCATGAAGTACCCTGTCCGAGA 4 33 

4 21 ACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 4 8 0 

1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 M I f i 1 1 1 1 j 1 1 1 

4 34 ACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 4 93 
4 81 GACCTTAGAAGTTCTACCCATGCTCAC 507 

lllllll MINI MINIMI I MM 

.4 94 GACCTTAGAAGTTCTACCCATGCTCAC 52 0 



RESULT 5 - 

33744515 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
?:EYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

uOLT<MAL 
CCMMEKT 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi ; Muridae; Murlnae; Mus . 



BB744515 469 bp mRNA linear ES,T. 16-OCT-2001 

SB744515 RIKEN full-length enriched, adult male kidney Mus musculus 
cDNA clone F530003I24 3', mRNA sequence. . 
BB744515 

.BB744515. 1 GI: 16152351 

EST . ■ " ■■ ■ .. . :•■ 

Mus musculus (house mouse) . 
Mus musculus 

Eukaryota ; Met azoa ; Chordata ; 
Mammalia; Eutheria,- Rodentia; 
1 (bases 1 to 469) 

Akimura,T., Arakawa,T., Carninci,P., Furuno^M. , Hanagaki,T., 
Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T. , Imotani,K., Ishii 
, Y., Ito # M. f Kawai,J., Kojima,Y., Konno,H., Kouda,M., Mat suyama , T . , 
r Nishi,K., Nomura , K . , Numasaki , R . , Okazaki,Y., Okido,T. 
Sakai,C, Sakai,K., Sakazume,N. , Sasaki, D., Sato,K., 
Shinagawa, A. , Shiraki,T., Sogabe,Y., Suzuki, H., Tagawa . 
,A., Takahashi, F. , Takaku-Akahira , S . , Tanaka,T., Tomaru,A., Toya,T. 
, Watahiki , A. , Yasunishi , A. , Muramatsu, M .. and Hayashizaki , Y . 
RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura , T . , et al . 
2G01) . 
Unpublished 

Contact: Yoshihide Hayashizaki ... 
Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email: genome-res@gsc.riken.go.jp, 

URL :.http :/ /genome .,gsc . riken . go . j p/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., Itoh 
, M., Konno ; H., Okazaki,Y., Muramatsu, M . and Hayashizaki , Y . 



Nakamura , M . 
-, Saito,R- , 
Shibata, K. , 



FEATURES 

source 



BASE COUNT 
ORIGIN. 



Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake, S . , Inoue # K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki , M . , Yoneda,Y., Ishikawa,T., Ozawa,K. , Tanaka,T., Matsuura 
, S . , Kawai,J., Okazaki,Y., Muramatsu , M . , Inoue , Y . , Kira,A. and 
Hayashizaki , Y . 

RIKEN integrated sequence analysis (RISA) system- -384 -format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. . 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi , Y . , Shibata,K., Itoh,M., Carninci,P., Sugahara 
,Y. and Hayashizaki , Y . 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for. construction of a 
■nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

e mouse tissues. 

Location/ Qualifiers 
1. .469 

/organism= "Mus musculus" 
/ mol_type= "mRNA" 
/db_xref="taxon: 10090" 
/clone="F53 0 003 124" 
/sex="'tnale" 
/tissue_type= "kidney" 
. / dev_B,t age =" adult" 
/lab host = n S0LR n 

/clone__lib= "RIKEN full-length enriched, adult male kidney-' 
/note="Site_l: Xhol; Site_2: SstI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 ' 

GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3 ' ] , cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5 1 

GAGAGAGAGAAGGATCCAAGAGCTCAATTAATTAATTAAAGCCCCCCCCCC 3 ' ] , 
cDNA was cleaved with Xhol and SstI . " 
160 a 75c 80 g 154 t 



Query Match 2 9.5%; Score 455; DB 10; Length 4 69; 

Best' Local Similarity 100.0%; Pred. No. 1.3e-83; 

Matches 455; Conservative 0; Mismatches 0; Indels 0,- Gaps. 0; 
Qy 1089 TAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTGGGTCCACA 114 8 

1 1 1 1 1 1 1 1 1 r 1 1 1 1 i ! i I i 1 1 1 } 1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 f i 

Db 1 TAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTGGGTCCACA 6 0 



Qy 1 1 4 9 TGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAGGTTATACCCAGAGTATGGAAAAAATA 12 08 

MIIIIIIIIIIIIIIIIIMMIIMINIMIIMItlllilllllMIIMIIIIII 

Db 6 1 TGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAGGTTA r rACCCAGAGTATGGAAAAAATA ..12 0' 



Qy 


1209 


AGGCATGAGAAAGCATTGACATCTTCACTTAAGAACTGAACAAAAGAGAACAAATATTGT 


1268 


Db 


121 


M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 II II 1 II 1 

AGGCATGAGAAAGCATTGACATCTTCACTTAAGAACTGAACAAAAGAGAACAAATATTGT 


180 


Qy 


1269 


CAATGTTTGGACACTTAGGATCTGAAATCTTGGAAATTTTAAGACCTCTTTTTCTATCAG 


1328 


Db 


131 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 II 1 1 1 1 1 1 1 1 I II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 

CAATGTTTGGACACTTAGGATCTGAAATCTTGGAAATTTTAAGACCTCTTTTTCTATCAG 


240 


Qy 


1329 


'tgtaaaaggaatacaagatagctagttgcaaatgctgaatgcatttcatcattggtcagg 


1388 


Db 


241 


1 1 1 II 1 1 1 M 1 1 1 1 1 1 1 1 1 1 II 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 I 1 1 1 1 
tgtaaaaggaatacaagatagctagttgcaaatgctgaatgcatttcatcattggtcagg 


3 00 


Qy 


13 8 9 


tcgataagcgtgtttctgaaatagtcttatttttattcttgtaatattaaaatttatgtg 


1448 


Db 


301 


MMMIIIIIMIIIIIIIMMMIMIIIIIIIMIIIIMIIMIIIIIIIIMII 

tcgataagcgtgtttctgaaatagtcttatttttattcttgtaatattaaaatttatgtg 


360 


Qy ■ 


1449 


aaaaatgaatataattcaatgtacaacattagattttctatttgaaaattatatttcttg 


1508 


Db 


361 


M 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

aaaaatgaatataattcaatgtacaacattagattttctatttgaaaattatatttcttg 


420 



AAAAAATAACTGCTGTGCCTAAATAAATCAATATA 1543 

NIIMHMilMIIIII I IIIIMIIIIIM 

AAAAAATAACTGCTGTGCCTAAATAAATCAATATA 455 



3B746222 458 bp .- mRNA linear EST 1 5 - OCT - 2 0 0 1' 

3B746222 RIKEN full-length enriched, adult male kidney Mus musculus 
cDNA clone F530013P03 3 1 , mRNA sequence. 

3B746222 . , 

BB746222.1 Gl : 16149159 

EST . . 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

1 (bases 1 to 458) 

Akimura,T., Arakawa,T. , Carriinci,P., Furuno,M., Hanagaki,T., 
Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., Imotani,K., Ishii 

. Y., Ito,M., Kawai,J., Kojima,Y., KonncH., Kouda,M., Matsuyama , T . , 
Makamura,M. , Nishi^., Nomura, K., Numasaki,R., Okazaki,Y., Okido,T. 

, Saito,R., Sakai,C., Sakai,K., Sakazume,N., Sasaki, D., Sato,K., 
Shibata,K., Shinagawa , A . , Shiraki,T., Sogabe,Y., Suzuki, H w Tagawa 

,A., Takahashi,F. , Takaku-Akahira , S . , Tanaka,T., Tomaru,A. , Toya,T. 

. Watahiki, A. , Yasunishi , A. , Muramatsu,M. and Hayashizaki , Y . 
RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura,T., et ah 

2 0 01) 

Unpublished 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSG) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Rax: 81-45-503-9216 



RESULT 6 
B.B746222 
LOCUS 

DEFINITION 

ACCESSION 
VERSION • 
BYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
COMMENT 



FEATURES 

sourc 



BASE COUNT 
ORIGIN 



Email : genome -res@gsc . riken. go . jp, 
URL :http : / /genome .gsc . riken. go. jp/ 

Carninci, P. , Shibata,Y. # Hayatsu,N., Sugahara,Y., Shibata,K., Itoh 
,M., Konno,H., Okazaki,Y., Muramatsu, M. and Hayashizaki , Y . 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura 
,S., Kawai,J., Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. and 
Hayashi zaki , Y . 

RIKEN integrated sequence analysis (RISA) system- -384 -format 
sequencing pipeline with 3 84 multicapillary sequencer. Genome Res. . 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi , Y . , Shibata,K., Itoh,M. , Carninci, P., Sugahara 
, Y. and Hayashi zaki , Y . 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

e mouse tissues. 

Location/Qualifiers 
1. .458 

/organism="Mus musculus" 
/mol__type^"mRNA" 
/db_xref ="taxon: 10090" 
/clone= ?, F53 0013P03 " 
/sex^ : 'male" 

/ 1 i s sue_type -"kidney , 
/dev_stage= "adult" , 
/ 1 ab_ho st=" SOLR " 

/ clone_lib= "RIKEN full-length enriched, adult male kidney" 
/note="Site_l : Xhol ; Site_2 : SstI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 1 

GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3 ' ] , cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5 1 

GAGAGAGAGAAGGATCCAAGAGCTCAATTAATTAATTAAACCCCCCCCCCC 3 ' ] . 
cDNA was cleaved with Xhol and SstI. *' 
150 a 75 c 82 g .151 t 



Query Match 28.4%; Score 438; DB 10; Length 458; 

Best Local Similarity 99.8%; Pred. No. 4.1e-80; 

Matches 449; Conservative 0; Mismatches 0; Indels 1; 



Gaps 



1; 



Qy ICS 8 GTTGAGTTTTAACTAAGTAAACCACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAAC 1117 

IMIIIMIIIIMMIIIIIIIIIilllllllllllllllllllMIMMMIMM! 

Db 10 GTTGAGTTTTAACTAAGTAAACCACCA.TTTCTAGGCTTTAGCTTTCCACCATCCTCCAAC 69 . 



Qy 


1113 


CCCCAGGGCTGGAGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGAT 


1177 


Db 


70 


1 i i 1 1 1 1 1 1 1 1 1 ! M 1 1 1 1 1 1 1 1 1 1 i 1 M 1 1 1 1 1 1 1 1 i 1 1 1 1 M 1 1 1 1 II 1 1 1 1 1 i 1 1 1 J 1 

CCCCAGGGCTGGAGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGAT 


129 


Qy 


1178 


TTTAGGTTATACCCAGAGTATGGAAAAAATAAGGCATGAGAAAGCATTGACATCTTCACT 


12 3 7 


Db 


13 0 


1 1 1 1 U 1 1 ' 1 h 1 1 1 ll ! M 1 M 1 ! ' 1 M 1 U N: U 1 Mi IMI ; 

TTTAGGTTATACCCAGAGTATGGAAAAAATAAGGCATGAGAAAGCATTGACATCTTCACT 


189 


Qy 


1238 


TAAGAACTGAACAAAAGAGAACAAATATTGTCAATGTTTGGACACTTAGGATCTGAAATC 


1297 


Db 


190 


MM MIMMMMMMMMMMMMMMMMMMMMMMMMMM 

TAAG-ACTGAACAAAAGAGAACAAATATTGTCAATGTTTGGACACTTAGGATCTGAAATC 


24 8 


Qy 


1298 


TTGGAAATTTTAAGACCTCTTTTTCTATCAGTGTAAAAGGAATACAAGATAGCTAGTTGC 


13 57 


Db 


249 


MIMMIMIIMIIIIIIIMIIIMIIIIilMIIIIIIIIIIIMIIMIIIIIM 

TTGGAAATTTTAAGACCTCTTTTTCTATCAGTGTAAAAGGAATACAAGATAGCTAGTTGC 


308 


Qy 


1358 


AAATGCTGAATGCATTTCATCATTGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTA 


1417 


Db 


309 


MMMMMMMMMMMMMMIMMMMMIMMMMMMMIMM! 

AAATGCTGAATGCATTTCATCATTGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTA 


368 - 


Qy 


1418 


TTTTTATTCTTGTAATATTAAAATTTATGTGAAAAATGAATATAATTCAATGTACAACAT 


1477 


Db 


369 


MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

TTTTTATTCTTGTAATATTAAAATTTATGTGAAAAATGAATATAATTCAATGTACAACAT 


423 


Qy 


1478 


TA( jATTTTC TATTTGAAAATTAT ATTT C TT 15 0 7 

UMIIIIIIMIIIIMIIIIIIIIMII 

TAGATTTTCTATTTGAAAATTATATTTCTT. 4 53 




Db ' 


42 9 




RESULT 7 
BB7 3 8743 
LOCUS . 


; BB 738743 428 bp mRNA linear EST 15 -OCT - 


2001; 



DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
COMMENT 



BB738743 RIKEN full-length enriched, 6 days neonate spleen Mus 
musculus cDNA clone F430109C18 3 ! , mRNA sequence. 
BB73 874 3 

BB738743.1 GI: 16141748 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodent ia; Sciurognathi ; Muridae,- Murinae; Mus. 
1 (bases 1 to 42 8) 

Akimura,T., Arakawa,T. , Carninci,P., Furuno,M., Hanagaki,T., 
Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., Imotani,K., Ishii 
,Y.,' Ito,M., Kawai,J., Kojima,Y., Konno,H., Kouda,M., Matsuyama, T . , . 
Nakamura,M., Nishi,K., Nomura, K. , Numasaki,R., Okazaki,Y., Okido,T. 
, Saito,R., Sakai,C., Sakai,K., Sakazume,N. , Sasaki, D., Sato,K., 
Shibata,K., Shinagawa, A. , Shiraki,T., Sogabe,Y., Suzuki , H., Tagawa 
,A., Takahashi , F. , Takaku-Akahira, S . , Tanaka,T., Tomaru,A., Toya,T. 

Watahiki, A. , Yasunishi , A. , Muramatsu,M. and Hayashizaki , Y . 
RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura, T. et al . 
2001) 

Unpublished 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC) , Yokohama Institute 



The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel : 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc . riken . go . jp , 
URL :http : //genome . gsc . riken. go . jp/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara, Y. , Shibata,K., Itoh 
,M., Konno,H., Okazaki,Y., Muramatsu, M. and Hayashizaki , Y . 

Normalization and subtraction of cap- trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue, K. , Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura 
,S., Kawai,J., Okazaki,Y., Muramatsu, M Inoue, Y. , Kira,A. and 
Hayashizaki , Y . 

RIKEN integrated sequence analysis (RISA) system- -384 -format 
sequencing pipeline with 384 mult icapillary sequencer. Genome Res. . 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi , Y . , Shibata, K. , ltoh,M., Carninci,P., Sugahara 
, Y . arid Hayashizaki , Y . 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a - 
nonredundant cDNA, library . Genome Res. . 11 (2), 281-289 (2001) 
'Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

e mouse tissues. 

'££ATtjRES Location/Qualifiers ■ \ 

source 1. .428 . 

/organism^ "Mus musculus :t 
/mol_ type- "mRNA" 
/strain="C57BL/6J" 
/db_xref = "t axon: 10 0-90." 
/clone= n F4 3010 9C18" 
/tissue_type= "spleen" 
/dev_stage="6 days neonate" 

/clone_lib= "RIKEN full-length enriched, 6 days neonate 
spleen" 

BASE COUNT 153 a 59 c 72 g 144 t 

ORIGIN 



Query Match 26.8%; Score 414; DB 10; Length 428; 

Best Local Similarity 100.0%; Pred. No. 3.7e-75; 

i^atches 414; Conservative 0; Mismatches 0; Indels 0; Gaps- 0; 

Oy 113 0 AGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAGGTTATAC 1189 

IMMi ill ! ! 1 1 1 1 f 1 1 1 1 1 M 1 1 1 i f 1 1 [ i 1 1 1 f I T 1 1 r 1 1 1 1 1 1 ! 1 1 1 j i I i I i 1 1 1 

Db 1 AGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAGGTTATAC 60 

Qy 1190 CCAGAGTATGGAAAAAATAAGGCATGAGAAAGCATTGACATCTTCACTTAAGAACTGAAC 124 9 

lirMIIIIIIIIIIMIIMIIIIIMMIIIIIIIIIIIIIIIIIMIIIIIUMII 

Db 61 CCAGAGTATGGAAAAAATAAGGCATGAGAAAGCATTGACATCTTCACTTAAGAACTGAAC 12 0 

Qy 12 5 0 AAAAGAG AACAAATATTGTC AATGTTTGGAC ACTTAGGATCTGAAATCTTGG AAATTTTA 13 0 9 

Ml MMIMMIMMUIMIIII ll IMMM MM II IIMMII III Ml IM Ml 

Db 121 AAAAGAG AACAA AT ATTGTCAATGTTTGGACACTTAGGATCTGAAATCTTGGAAATTTT A 18 0 



1310 AGACCTCTTTTTCTATCAGTGTAAAAGGAATACAAGATAGCTAGTTGCAAATGCTGAATG 13 69 



II I II II II II I i 1 1 II II II II II II II 1 1 II II II II 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 II 

Db 181 AGACCTCTTTTTCTATCAGTGTAAAAGGAATACAAGATAGCTAGTTGCAAATGCTGAATG 2 4 0 

Qy 13 7 0 CATTTCATCATTGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTG 14 2 9 

IMU'lllllll IIIIIMIII, 1:1 IIHUI I M, MINIM MINIMI: 

Db 241 CATTTCATCATTGGTCAGGTCGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTG 3 00 

Qy 143 0 TAATATT AAAATTTATGTGAAAAATG AATATAATTC AATGTAC AACATTAG ATTTTCTAT 14 89 

MINIMI MINI Mill MINI IIIIIIIIIIIIIIIIIIIIMI Mill III II I 

Db 301 TAATATTAAAATTTATGTGAAAAATGAATATAATTCAATGTACAACATTAGATTTTCTAT 3 60 

Qy 14 90 TTGAAAATTATATTTCTTGAAAAAATAACTGCTGTGCCTAAATAAATCAATATA 154 3 

1 1 1 Ml I II i 1 1 1 1 M I ! 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1! I II 1 1 ill MINIMI 

Db 361 TTGAAAATTATATTTCTTGAAAAAATAACTGCTGTGGCTAAATAAATCAATATA 414 



RESULT 3 
BB847918 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



- AUTHORS 



JOURNAL, 
COMMENT 



Crani a t a ; Ve r t e,bra t a ; Eu t e 1 ec s t omi ,- 
Sciurognathi; Muridae; Murinae ; Mus . 



BB847918 422 bp mRNA linear EST 2 6 -NOV-2 00 1 

BB847918 RIKEN full-length enriched, adult male kidney Mus musculus 
cDNA clone F530201F11 5', mRNA sequence. 
BB847918 

BB847918.1 GI:17086293 
EST. 

Mus musculus (house mouse) 

Mus musculus ■ ; 

Eukaryota; Metazoa; Chordata'; 
Mammal ia ,- Eut.her ia ; , Rodent ia ; 
1 (bases 1 to 422; ^ 
Akimura,T., Arakawa/T., Carriinci . P . , Furuno,M. , Hanagakl , T. , : 
Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., Imotani , K. , Ishii 
, Y., Ito,M. f K&wai,J., Kojima,Y., Konno,H. , Kouda,M., Ma t suyama , T . , 
Nakamura,M. , Nishi v K., Nomura, K., Numasaki,R., Okazaki,Y., Okido, T . 
, Saito,R., Sakai,-C.. # Sakai,K., Sakazume,N. , Sasaki, D.", Sato,K. , 
Shi.bata, K. , Shinagawa,A. , Shiraki,T., Sogabe,Y., Suzuki, H., Tagawa 
, A., Takahashi , F . , Takaku-Akahira , S . . , Tanaka,T., Tomaru,A., Toya,T. 
, Watahiki,A., Yasunishi , A. , Muramatsu,M. and Hayashizaki , Y . 
RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura , T . , et al . 
2001) 

Unpublished 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC), Yokohama Institute < 
The Institute of Physical and Chemical Research (RIKEN) 
1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 
Tel: 81-45-503-9222 
■Fax: 81-45-503-9216 
Email: genome-res@gsc.riken.go.jp, 
URL : ht tp : / / genome . gsc . r iken . go . j p/ 

Carninci,?. , Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., Itoh 
,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki , Y . 

Normalization and subtraction of cap- trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new . 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M. , Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura 
,-£., Kawai,J., Okazaki,Y., Muramatsu , M . , Inoue,Y., Kira,A. and 



Hayashizaki , Y . 

RIKEN integrated sequence analysis (RISA) system- -3 84 - format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. . 
10 (11) , 1757-1771 (2000) 

Konno,H., Fukunishi , Y . , Shibata,K., Itoh,M., Carninci,P., Sugahara 
, Y. and Hayashizaki , Y . 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 
Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details, 
e mouse tissues. 
FEATURES Location/Qualifiers 
source- 1. .422 

/organism= M Mus musculus" 

/mol_type="mRNA" 

/db_xref= ,J taxon: 10090" 

/clone="P53 02 01Fll" 

/ sex= "male M 

/ 1 i s sue_type- " kidney " 

/dev_stage=" adult" 

/lab_host="SOLR" 

/clone_lib="RIKEN full-length enriched, adult male kidney" 
„/note="Site_l: Xhol; Site_2 : SstI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 '. ' 

GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3 ' J , cDNA was. 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5' 

GAGAGAGAGAAGGATCCAAGAGCTCAATTAATTAATTAAACCCCCCCCCCC 3 1 ] . 

cDNA was cleaved with Xhol and SstI. " 
BASS COUNT 104 a 100 c 88 g ; 130 t 

ORIGIN 



Query Match 26.2%; Score 403.8; DB 10; Length 422; 

Be^t IjOCu.1 n Similarity 99.5%; Pred. No. 4.8e-73; - 

Mc:tch?is 405; Conservative " 0; Mismatches 2; Indels 0; Gaps 0; 



Qy 


1 


GCTCCTGGGAGAGTTTTCTGTCGAGAGAGAAGCCGACAGCAGAATGGCACAGAATTTATC 

IMIIIMIIIMIilllll Iliillllllll.lMIIIIMIIIIIIilllMMIIIII 

GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAATTTATC 


60 


Db: 


16 


75 


Qy 


61 


TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 

IIMMMMMMMMMMIMMMMMMiMIMMMMMIIMMIMM 

TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 


12 0 


Db 


7 6 


135 


Qy 


121 


TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTAGCTCTT 


180 


Db 


13 3 


II Mill llilllllllllll II MM Mill II II Ml Mill MM lllllllllll 

TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTTGGCTACCTCTT 


195 


Qy 


181 


CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACGTTTCCATCTCTGACTT 


240 



M 1 1 1 1 : 1 1 1 1 1 1 1 ; II M 1 1 1 M 1 1 1 1 M 1 1 1 II I ! M M I M ' 1 1! : 1 1 1 i I . I : M I 

Db 196 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 2 55 

Qy 241 TG C TTTC C TGTGC AC C C TT C C CAT C CTG AT AAAG AGTT ATGC C AATG AT AAGGGG AC C T A 3 00 

H lliriilll'lilll.lll III III IhMMI IIIMMMII hllliMMI 

Db 256 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 315 

Qy 3 01 TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGCAT 3 60 

MIIMIIIIIIIIIIMIIII III Mill llllll IIIIIIIIIIMIIIMI Mill I 

Db 316 TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGCAT 3 75 

Oy 3 61 CCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGCTCATGAAGT 4 07 

MMIIMMMMMMMIIMIMIMMIMM IIIMIMI 

Db 3 76 CCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGGTCATGAAGT 422 



RESULT 9 
BB864882 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE- 
AUTHORS 



EST 27-NOV-2001 



TITLE 

JOURNAL 
COMMENT 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi ; Muridae; Murinae; Mus. 



BB864-882 420 bp mRNA linear 

BB864882 RIKEN full-length enriched, RCB-1283 B16 melanoma cDNA Mus 
musculus cDNA clone G430047C11 5', mRNA sequence. 
BB864882 

BB864882.1 GI:17111092 
EST. 

Mus musculus (house mouse) 
Mus musculus 

,'Sukaryota; . Metazpa.; Chorda t a ;< 
"Mammalia; Eutheria; Rodentia; 
1 (bases 1 to 420) . : .. ■:: 

Akimura,T., Arakawa, T.v Carninci,P., Furuno,M., Hanagaki , T.', ■ 
Hayatsu,N. , Hiramoto, K. , Hiraoka,T. , Hirozane,T. , ImotariivK. , Ishii 
, Y., Ito,M. f Kawai , J . , Ko j ima , Y . , Konno,H., Kouda,M., Matsuyama , T . , 
Nakamura,M. f Nishi,K. , Nomura, K. , Numasaki, R. , Okazaki,Y. , Okido, T. 
, Saito,R., Sakai,C, Sakai,K./ Sakazume,N., Sasaki-, D .,' Sato,K., 
Shibata,K., Shinagawa,A. , Shiraki , T . # S.ogabe,Y., Suzuki, H., Tagawa 
, A. , Takahashi, F . , Takaku-Akahira , S . , Tanaka,T. , Tomaru,A. , Toya,T. 
, Watahiki,A., Yasunishi , A. , Muramatsu,M. and Hayashizaki , Y . 
RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura,T., et al . 
2001) . 
Unpublished 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center(GSC) / Yokohama Institute V 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email: genome-res@gsc.riken.go.jp, 
T JRL : http : //genome . gsc . riken . go . jp/ 

Carninci, P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., Itoh 
, M. , Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki , Y . 

Normalization and subtraction of cap- trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y. , Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura 
, S., Kawai, J., Okazaki,Y., Muramatsu, M . , Inoue,Y., Kira, A. and 



FEATURES 

source 



BASE COUNT 
ORIGIN" 



Hayashizaki , Y . 

RIKEN integrated sequence analysis (RISA) system- -3 84 - format 
sequencing pipeline with 3 84 mult icapillary sequencer. Genome Res. . 
10 (11) , 1757-1771 (2000) 

Konno,H., Fukunishi , Y . , Shibata,K., Itoh,M., Carninci,P., Sugahara 
,Y. and Hayashizaki , Y . 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 
Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details, 
e mouse tissues. 

Location/ Qualifiers 
1. .420 

/organism= M Mus musculus" 
/ mol_type= "tnRNA" 
/db_xref ="t'axon: 10090 
/clone= n G4 3 0 04 7Cll n 
/cell_line="RCB-12 83 B16 melanoma" 

/clone_lib="RIKEN full-length enriched, RCB-1283 B16 
me 1 anoma c DNA " 
102 a 103 c 87 g 128 t 



Query Match 25.2%; Score 388.4; DB 10; Length 420; 

£est Local Similarity 99.5%; Pred. No. 7.3e-70; <- 
tVotches 4 00; Conservative 0; Mismatches 1; Indels 1; Gaps 



i; 



1 GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAATTTATC 6 0 

: ; ' ! ! I 1 I : I I I I I I I I I I I ! - ' : " i I I i ! I I I : I I I I : : I 

19 GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGC ACAGAATTTATC 7 8 



Ov 



SI TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 1L2 0- 

-'IMiMIII'Ml! | .; | MiiiiiiMlllllMMIIIIilM 

7 9 TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 13 8 



Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 



121 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 180 

MMMMMIMMM III IMMIII M llllll Mill MIM 11111,1! 

13 9 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTTGGCTACCTCTT 198 
181 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 24 0 

IMIMIII I Ml M Mil M M HUH llllll II M ! 1 1 1 1 1 1 1 1 1 1 1 1 . 1 II 

19 9 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 2 58 
241 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCC AATGATAAGGGGACCTA 3 0 0 

MMIMI II MM II Mill llllllll Ml III IMIMM MIMIIIII Ml 

2 59 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 318 

3 01 TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGCAT 360 

M M : 1 1 1 1 1 1 1 M M 1 1 M 1 1 1 1 1 1 1 Ill 1 1 1 1 1 1 ' 1 1 1 

3 19 TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGCAT 3 78 



Qy 

Db 



3 6*1. CCTCTTCCTCACTTTCATTAG - CATGGACCGATATCTGCTCA 4 01 

1 1 1 1 1 1 1 1 1 ! 1 1 ! 1 1 1 1 1 1 1 1 I llllllll III MMIMI 

3 7 9 CCTCTTCCTCACTTTCATTAGCCATGGACCGATATCTGCTCA 420 



RESULT 10 

BB778587 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
' AUTHORS 



TITLE 

- JOURNAL 
COMMENT 



FEATURES 



BB778587 426 bp mRNA linear EST 15-NOV-2001 

BB778587 RIKEN full-length enriched, RCB-1283 B16 melanoma cDNA Mus 
musculus cDNA clone G430047C11 3', mRNA sequence. 
BB778587 

BB77 85 87 .1 GI : 1693 92 87 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chorda t a; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodent ia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 426) 

Akimura,T., Arakawa,T., Carninci,?., Furuno,M., Hanagaki,T., 
Hayatsu,N., Hiramoto, K . , Hiraoka,T., Hirozane,T., Tmotani , K . , Ishii 
,Y., Ito,M., Kawai,J., Ko j ima , Y . , Konno , H . , Kouda,M., Matsuyama, T . , 
Nakamura , M . , Nishi,K., Nomura, K., Numasaki,R., Okazaki , Y . , Okido, T. 
, SaitO,R., Sakai,C, Sakai,K., Sakazume , N". , Sasaki, D., Sato,K., 
Shibata,K., Shinagawa, A. , Shiraki ,-T . , Sogabe,Y., Suzuki, H., Tagawa 
, A., Takahashi , F . , Takaku-Akahira , S . , Tanaka,T., Tomaru,A. , Toya,T. 
, Watahiki,A., Yasunishi , A. , Muramatsu,M. and Hayashizaki , Y . 
RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura, T . , . et al . 
2001) 

Unpublished 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC) Yokohama Institute ; 

The Institute of Physical and Chemical Research (RIKEN) 

-.1-7-22 Suehiro-cho, . Tsurumi-ku, Yokohama, Kanagawa 23 0-004 5 f Japan 

Tel: . 81-45-503-9222 

Fax.: 81-45-503-9216 

i'Unai 1 : genome - res@gsc . riken . go . j p , 
URL :http :/ /genome . gsc .riken . go . j p/ 

Carninci , P . , Shibata, Y . , Hayatsu,N., Sugahara, Y., Shibata, K: , Itoh 
,M., Konno, H., Okazaki, Y., Muramatsu,M. and Hayashizaki , Y . 

Normalization and subtraction of cap- trapper- selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K. , Tanaka,T., Matsuura 
,..S.-, Kawai,J., Okazaki, Y., Muramatsu, M . , Inoue , Y . , Kira,A. and 
Hayashizaki , Y . . 

RIKEN integrated sequence analysis (RISA) system- -384 -format 
sequencing pipeline with 3 84 multicapillary sequencer. Genome Res. 
10 (11) , 1757-1771 (2000) 

Konno, H., Fukunishi , Y . , Shibata,K., Itoh,M., Carninci, P., Sugahara 
.Y. and Hayashizaki , Y . 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 
Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details, 
e mouse tissues. 

Location/Qualifiers 
1. .426 

/organism="Mus musculus" 
/ mo l_t ype = !l mRNA " 



/ db_x r e f = " t axon : 1 0 0 9 0 " 
/clone= M G4 3 0 04 7Cll" 
/cell_line= n RCB-1283 B16 melanoma" 

/clone__lib="RIKEN full-length enriched, RCB-1283 B16 

melanoma cDNA " 
BASE COUNT 153 a 58 c 76 g 139 t 

ORIGIN. 



Query Match 24.9%; Score 384.8; DB 10; Length 426; 

Best Local Similarity 98.8%; Pred. No. 4.1e-69; 

Matches 419; Conservative 0; Mismatches 2; Indels 3; Gaps 3; 
Qv 112 3 GGGCTGGAGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAG 1182 

'11111:11 III IIIIIMi III II IIIMMIMI Ml M!IMl:hlll!lllil 

Db ' 3 GGGCTGGAGTACAAGCTGGGTCCACATGAATCAGAAGGCAGCTCTCTGTTCTGATTTTAG 62 

Qy 1183 GTTATACCCAGAGTATGGAAAAAATAA-GGCATGAGAAAGCATTGACATCTTCACTTAAG 1241 

III MINI III Mill IIMMil Mill 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 r 1 1 1 1 1 

Db .. 63 GTTATACCCAGAGTATGGAAAAAATAAGGGCATGAAAAAGCATTGACATCTTCACTTAAG 122 

Qy 1242 AACTGAACAAAAGAGAACAAATATTGTCAATGTTTGGACACTTAGGATCTGAAATCTTGG 13 01 

I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml 1 1 II 1 1 1 1 1 1 II 1 1 M I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 

Db 123 AACTGAACAAAAGAGAACAAATATTGTCAATGTTTGGACACTTAGGATCTGAAATCTTTG 182/ 

Qy 13 02 AAATTTTAAGACCTCTTTTTCTATCAGTGTAAAAGGAATACAAGATAGCTAGTTGCAAAT 13 61 

! I ! 1 1 M ! I Mil 1 1 M I i 1 1 1 li I M 1 1 1 1 1 I 1 1 Ml I I ! M M ! 1 1 ! 1 1 ! M 1 1 1 

Db ' . 183 AA.ATTTTAAGACCTC1 TTTTCTATOAOTGTAAAAGGAATAOAAGATAGOTAGTTGCAAAT 242. 

■Qy. : 62 . GCTGP.ATGCATTTCATCATTGGTCA-3GTC3ATAAGCGTGTTTGTGAAATAGTGTTATTT L4i0 

M 1 1 1 1 II 1 1 I ill I M I Ml II 1 1 1 1 1 MM M 1 1 1 1 1 1 1 1 1 1 1 1 1 Mill III 1 1 11 

±)o 24 3 GCTGAATGCATTTCATCATTGGTCACGGTCGATAAGCGTGTTTCTGAAATAGTCTTATTT 3 02- 

' Qv i - 2 1 ' L ! T ATTCTTGTAA TATTAAAATTTATG TGAAA AATG AATATAATTC AATOTAC AAC ATT AG 14 8 0 

IIMMIIIMMIMIIiMilMIIIMilllllllMinilMIIIMIIIMill 

L-hy "303 TTATTCTTGTAATATTAAAATTTATGTGAAAAATGAATATAATTCAATGT^CAACATTAG 3 62. 

Qv 14 81 ATTTTCTA-TTTGAAAATTATATTTCTTGAAAAAATAACTGCTGTGCCTAAATAAATCAA 153 9 

IIIIMII IMIMIIIIIIIIMMMIIIIIIIIIMIIIIIIIIIMIIIIMIM 

Db - 3 63 ATTTTCTAGTTTGAAAATTATATTTCTTGAAAAAATAA.CTGCTGTGCCTAAATAAATCAA 42 2 

Qy 1540 TATA 1543 ' . 

Mil' 

Db 423 TATA 426 



RESULT II 

BB739432 - • 

LOCUS BB7 3 94 82 3 96 bp mRNA linear EST 15-OCT-2 001 

DEFINITION BB739482 RIKEN full-length enriched, 6 days neonate spleen Mus 
musculus cDNA clone F430113M16 3'. mRNA sequence. 

Accession 3B739482 

VERSION BB739482.1 Gl: 16142487 

KEYWORDS EST. 

SOURCE ' Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoar Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



BASE CCmjV: 
ORIGIN 



1 (bases 1 to 396) 

Akimura,T., Arakawa,T., Carninci,P., Furuno,M. , Hanagaki,T., 
Hayatsu,N., 'Hiramoto, K. # Hiraoka,T., Hirozane,T., Imotani , K. # ^Ishii 
, Y., Ito,M., Kawai,J., Kojima,Y., Konno,H., Kouda,M., Matsuyama , T . , 
Nakamura , M . , Nishi,K., Nomura, K., Numasaki , R . ., Okazaki,Y., Okido,T. 
, Saito,R., Sakai,C, Sakai,K., Sakazume,N., Sasaki, D., Sato,K., 
Shibata,K., Shinagawa,A. , Shiraki,T., Sogabe,Y., Suzuki ,H., Tagawa 
,A. , Takahashi , F . , Takaku-Akahira , S . , Tanaka,T., Tomaru,A., Toya,T. 
, Watahiki,A., Yasunishi , A. , Muramatsu, M. and Hayashizaki , Y . 
RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura, T . , et al . 
2001) 

Unpublished 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-2 2 Suehiro-cho, Tsurumi-ku, .Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome -res@gsc . riken. go . jp, 

URL : http : //genome . gsc . riken . go . jp/ 

Carninci,P., Shibata,Y., Hayatsu, N ., Sugahara, Y . , Shibata,K., Itoh 
, M., Konno,H., Okazaki,Y., Muramatsu, M . and Hayashizaki , Y . 
Normalization . and subtraction of cap- trapper-selected cDNAs to 
prepare full-lerigth cDNA libraries for rapid discovery of new : 
genes. Genome Res. . 10, (10), 1617-1630 (2000) 

"". wa$i,K. f Fujiwake,S., Inoue , K. , . Togawa, Y . Izawa , M . , Ohara,E., 
Watahik.i ,M. , Yoneda , Y . , Ishikawa,T., Ozawa,K., ■ Tanaka , T- : , Matsuura 
,S., Kawai,J., Oka'zaki , Y. > Muramatsu, M. ', Inoue, Y., Kira,A. and 
Hayashizaki , Y . ? 

RIKEN integrated sequence analysis (RISA) system- -3 84 -format 
sequencing pipeline with 334 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) ■ 

Konno,H., Fukunisb i , Y . ., Shibata,K., Itoh,M., Carninci, P ., Sugahara 
, Y.' and Hayashizaki ,.Y . 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res . . 11 (2), 281-289 (2001) 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

e mouse tissues. 

Location/Qualifiers 

1. . 3 96 - ■ 

/organism="Mus musculus" 

/mol_type="mRNA 11 

/ strain- "C57BL/6 J" 

/ db_xr e f = " t axon : 1 0 0 9 0 " 

/clone="F430113M16" 

/tissue_type= " spleen" 

/ devest age =" 6 days neonate" 

/clone_lib= "RIKEN full-length enriched, 6 days neonate 
spleen" 

142 a 52 c 62 g 140 t 



.Query Match 24.7%; Score 380.4; DB 10; Length 396; 

• Best Local Similarity 9.9.7%; Pred. No. 3.3e-68; 



Matches 381; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 
Qy 

Db 

Qy 



1 162 AGCTCTCTGTTCTGATTTTAGGTTATACCCAGAGTATGGAAAAAATAAGGCATGAGAAAG 12 2 1 

1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 M 1 1 1 1 1 II 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 II 1 1 1 1 1 1 1 

1 AGCTCTCTGTTCTGATTTTAGGTTATACCCAGAGTATGGAAAAAATAAGGCATGAGAAAG 60 
1222 CATTGACATCTTCACTTAAGAACTGAACAAAAGAGAACAAATATTGTCAATGTTTGGACA 12 81 

1 1 1 i ! I ! 1 1 1 1 1 1 1 1 ; 1 1 1 ! 1 1 : M 1 1 i 1 1 M ! 1 1 1 1 1 1 1 i M I 1 1 1 1 i ! 1 1 1 1 1 : 1 : 1 ! 

61 CATTGACATCTTCACTTAAGAACTGAACAAAAGAGAACAAATATTGTCAATGTTTGGACA 12 0 
12 82 CTTAGGATCTGAAATCTTGGAAATTTTAAGACCTCTTTTTCTATCAGTGTAAAAGGAATA 13 41 

MMMMI IMIIIM Mill MM III MMIMM Mill MMMI MM III II 

121 CTTAGGATCTGAAATCTTGGAAATTTTAAGACCTCTTTTTCTATCAGTGTAAAAGGAATA 18 0 
1342 CAAGATAGCTAGTTGCAAATGCTGAATGCATTTCATCATTGGTCAGGTCGATAAGCGTGT 14 01 

IMIIIIIIIIIIIIIIIIIIIIIMIIMIIIIIIIMIIIIIMIIIMIMIIIMI 

181 CAAGATAGCTAGTTGCAAATGCTGAATGCATTTCATCATTGGTCAGGTCGATAAGCGTGT 24 0 
1.4 02 TTCTGAAATAGTCTTATTTTTATTCTTGTAATATTAAAATTTATGTGAAAAATGAATATA 14 61 

1 1 ! 1 1 1 M I ! 1 1 1 1 1 1 M I M 1 1 M 1 1 1 1 1 1 1 M 1 1 H I i 1 1 

241 TTCTGAAATAGTCTTATTTTTATTCTTGTAATATTAAAATTTATGTGAAAAATGAATATA 3 00 
1462 ATTCAATGTACAACATTAGATTTTCTATTTGAAAATTATATTTCTTGAAAAAATAACTGC 1521 

1 1 1 1 1 IMIII MM III MUM IMIII IMIII Mill III IMIMIIM Ml 

301 ATTCAATTTACAACATTAGATTTTCTATTTGAAAATTATATTTCTTGAAAAAATAACTGC 3 60 
1522 TGTGCCTAAATAAATCAATATA 1543 ; 

MMIMIIMIIMIIIIMI 

361 TGTGCCTAAATAAATCAATATA 3 82 



RESULT 12 
AI64 9254/C 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
■ AUTHORS 



TITLE. 
JOURNAL : 
COMMENT 



AI649254 367 bp' mRNA linear EST 30 APR - 1995* 

uk27cl0.xl Sugano mouse kidney mkia Mus musculus cDNA clone' 
IMAGE: 1970226 3\ mRNA sequence. - 
AI649254 

AI649254.1 GI:4730088 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 367) 

Marra,M. , Hillier,L., Kucaba,T., Martin, J., Beck,C, Wylie # T., 
■ Underwood, K. , Steptoe,M., Theising,B., Alien, M., Bowers, Y., Person 
, B". , Swaller,T., Gibbons, M. , Pape,D., Harvey, N., Schurk,R., Ritter. 
, E., Kohn,S., Shin,T., Jackson, Y., Cardenas, M. , McCann,R., 
Waterston,R. and Wilson, R. 
The WashU-NCI Mouse EST Project 1999 
Unpublished ^ 
Other__ESTs: uk27cl0.yl 

Contact: Marra M/WashU-NCI Mouse EST Project 1999 
Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 
Tel : 314 286 1800 
Fax: 314 286 1810 



Email : mouseest@watson . wust 1 . edu 

This clone is available royalty- free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 986966 

This clone was previously sequenced on the 5' end only, this new 
data is from the 3 1 end 
Seq primer: custom primer used 
High quality sequence stop: 353. 
FEATURES Location/Qualifiers 
source 1. .367 

/organism="Mus musculus" 

/mol_type= "mRNA" 

/strain="C57BL" 

/ db_xr e f = " t axon : 1 0 0 9 0 " 

/clone=" IMAGE: 1970226" 

/sex=" female" 

/ dev_s t age = " adul t " 

/ 1 abjio s t = M DH 1 0 B " 

/clone_lib="Sugano mouse kidney mkia" 

/note="Organ: kidney; Vector: pME18S-FL3; Site_l: Drain 
(CACTGTGTG) ; Site_2 : Drain (CACCATGTG) ; 1st strand cDNA 
was primed with an oligo(dT) primer 

[ATGTGGCCTTTTTTTTTTTTTTTTT] ; double- stranded cDNA was 
ligated to a Drain adaptor [TGT.TGGCCTACTGG] , digested 
and cloned into distinct Drain sites of the pME18S-FL3 
vector (5' site CACTGTGTG, 3' site CACCATGTG) . . Xho I should 
be used to isolate the cDNA insert. Size selection was 
- performed to exclude fragments <1.5kb. Library 

constructed, by Dr. Sumio Sugano (University of Tokyo 
Institute of Medical Sci ence) . Custom primers for 
sequencing: 5 1 end primer CTTCTGCTCTAAAAGCTGCG and 3' end 
primer CGACCTGCAGCTCGAGCACA . " 
BASS COUNT 106 a 73 c 71 g 117 t 

ORIGIN " . ~ . 



Query Match 23.6%; Score 363.8; DB 9; Length 367; 

Best Local Similarity 99.5%; Pred. No. 8.9e-65; 

Matches 365; Conservative 0; Mismatches 2; Indels 0; Gaps 0; 



QY 


1035 


CACTTGATAAACAGTGCTGTGCAGTTGAGTTTTAACTAAGTAAACCACCATTTCTAGGCT 


1094 


Db 


3 67 


MIIIIIIIIMMIIIIIIMIIIIIIIIIIIMMIIIIIIIIIIIIIIIIIII III 

CACTTGATAAACAGTGCTGTGCAGTTGAGTTTTAACTAAGTAAACCACCATTTCTACGCT 


308 


Qy 


1095 


TTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTGGGTCCACATGAATC 


1154 


Db 


3 07 


MMIMIIIIIIIIIMIIIMIIiMMIMIIIIIIMMMIIMIIMIIIIIII 

TTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTGGGTCCACATGAATC 


243 


Qy 


1155 


AGAAGGCAGCTCTCTGTTCTGATTTTAGGTTATACCCAGAGTATGGAAAAAATAAGGCAT 


1214 


Db 


247 


IIIMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIII 

AGAAGGCAGCTCTCTGTTCTGATTTTAGGTTATACCCAGAGTATGGAAAAAATAAGGCAT 


188 


Qy 


12 IS 


GAGAAAGCATTGACATCTTCACTTAAGAACTGAACAAAAGAGAACAAATATTGTCAATGT 


1274 


Db 


187 


M II 1 M 1 1 1 1 1 1 ! 1 ! 1 1 1 1 1 1 1 1 1 II 1 ! M 1 1 1 1 1 M i 1 M 1 1 1 1 1 1 1 II 1 1 i 1 1 1 1 1 

GAGAAAGCATTGACATCTTCACTTAAGATCTGAACAAAAGAGAACAAATATTGTCAATGT 


12 8 


Qy 


12 7 5 


'TTGGACACTTAGGATCTGAAATGTTGGAAATTTTAAGACCTCTTTTTCTATCAGTGTAAA 


1334 



Mlilll 'MMMUIIIMI IIIIIIMMIM I 1 MINIM milll MINI 



Db 



12 7 TTGGACACTTAGGATCTGAAATCTTGGAAATTTTAAGACCTCTTTTTCTATCAGTGTAAA 68 



Qy 1335 AGGAATACAAGATAGCTAGTTGCAAATGCTGAATGCATTTCATCATTGGTCAGGTCGATA 13 94 

IIIIIIIIIIIIIIIIMIIIMIIIIIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIII 

Db 6 7 AGGAATACAAGATAGCTAGTTGCAAATGCTGAATGCATTTCATCATTGGTCAGGTCGATA 8 

Qy 1395 AGCGTGT 1401 

Illllll 
Db 7 AGCGTGT 1 



RESULT 13 

BB645274 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



BB645274 



63 6 bp 



mRNA 



linear EST 31-AUG-2001 



days neonate male adipose 
mRNA sequence . 



BB645274 RIKEN full-length enriched, 
Mus musculus cDNA clone B430012O21 5' 
BB645274 

BB645274.1 GI:15402306 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 63 6) 

Arakawa,T., Carninci, P., Fukuda,S., Furuno,M. , Hanagaki,T. , Hara,A.> 
Hiramoto , K . , Hori,F. f Ishii,Y., Ito,M., Kawai , J . , Konno , H . Kouda 
,M., Koya,S., Matsuyama , T . Miyazaki,A., Nomura, K. , Ohno,M., 
Okazaki,Y., Okido,T., Saito,R., Sakai , C . , Sakai , K. , Sano,H., Sasaki 
. ,D., Shibata, K. ., Shinagawa, A. , Shiraki,T.., Soga.be, Y., Suzuki ,H . , - 
Tagami/M., Tagawa,A., Takahashi , F . , Takeda, Y . , Tanaka,T., Toya,T., 
Muramatsu, M. and Hayashizaki , Y . 

RIKEN Mouse ESTs (Arakawa,, T . , et al . 2001) ; 
Unpublished 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 .Suehiro-cho, Tsurumi-ku, Yokohama, fCanagawa 230-0045, Japan 

Tel : 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc . riken.go. jp, 
URL :http : //genome .gsc . riken. go . jp/ 

Carninci, P., Shibata, Y. , Hayatsu f N., Sugahara, Y., Shibata, K., Itoh 
,M., Konno, H. , Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 
"prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M., Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura 
,S., Kawai, J., Okazaki,Y., Muramatsu , M . , Inoue,Y., Kira,A. and 
Hayashizaki , Y. 

RIKEN integrated sequence analysis (RISA) system- -384 -format' 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. . 
10 (11) , 1757-1771 (2000) 

Konno, H., Fukunishi, Y. , Shibata , K . , Itoh,M., Carninci, P., Sugahara 
,Y. and Hayashizaki , Y . 

Computer-based methods for the mouse full-length cDNA 



FEATURES 

source 



encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 

Yamanaka , I . , Ki-yosawa, H . , Kondo,S., Saito,T. # Shinagawa,A. ,' Aizawa 
,K., Fukuda,S., Hara,A., Itoh,M., Kawai,J., Shibata,K., Arakawa,T., 
Ishii,Y. and Hayashizaki , Y . 

Mapping of 19032 mouse cDNAs on mouse chromosomes. J. Struct. 
Func. Genomics 2 pre, L72-L86 (2001 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

e mouse tissues. 

Location/Qualifiers 
1. .636 



organism="Mus musculus" 



BASE COUNT 
ORIGIN 



/ 

/mol_type= "mRNA" 
/db_xref="taxon: 10090" 
/clone="B4 3 0 012021" 
/sex="male" 

/tissue_type= ,, adipose n 
/dev__stage="4 days neonate" 
/lab_host="DH10B" 

/clone_lib="RIKEN full-length enriched, 4 days neonate 
male adipose" 

/note="Site_l: Sail; SiteJ2 : BamHI ; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 1 

GAGAGAGAGAAGGATCCAAGAGCTCTTTTTTTTTTTTTTTTVN 3'], cDNA was 
prepared by using trehalose thermo -activated reverse 
1 transcriptase and subsequently enriched for full-length by 
cap-trapper. cDNA went through one round of normalization 
to Rot = 10.0. and subtraction to Rot = 229.0. Second 
strand cDNA was prepared with the primer adapter of 
sequence [5 • GAGAGAGAGATTCTCGAGTTAATTAAATTAATCCCCCCCCCCCCC 
3'] . cDNA was cleaved with Xhol and BamHI. Vector: a 
modified pBluescript KS (+) after bulk excision from Lambda 
FLC I . " 

170 a .118 c 140 g 207 t 1 others 



a-- 



Query Match 23.2%; Score 357.6; DB 10; Length 636; 

Best Local Similarity 91.7%; Pred . No. 1.8e-63; 

Matches 389; Conservative 0; Mismatches 34; Indels 1; 



Gaps 



1; 



Qy 

Db 
Qy 
Db 

Qy 

Db 



' 1 GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAATTTATC 6 0 

IIIIIIIMMIIIIMIIIIIIIIMIIMIIIIIIIIIIIIIIIIIIIIMMIIIII 

2 0 GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAATTTATC 7 9 
6 1 TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 12 0 

IIIIIIIMNIMIIIIIMIIIIIIIIIIIIIIIIIMIillllMIMM IIIIM 

SO- TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGGATTTTA 13 9 
121 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 180 

1 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 III I 1 1 1 1 1 1 II Mill II 

14 0 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGGGGTGTTTGGGTACCTGTT 199 



Qy 


181 


Db 


200 


Qy 


241 


Db 


2 60 


Qy 


301 


Db 


3 2 0 


Qy 


3 60 


Db 


380 


Qy 


420 


Db 


440 



1 1 1 f 1 1 r f 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 i 1 1 r 1 1 1 1 [ 1 1 1 1 1 ! f ; 1 1 m 1 1 1 1 1 1 ! 1 1 1 1 



MINIMI 1 1 1 f 1 1 1 1 1 E 1 1 f 1 1 1 1 1 1 1 1 1 1 1 1 i 



1 1 1 1 1 1 1 1 M M I I 



M II II II M I M IIMIMMIMM 



M III IMI M II I 



I 1 1 



III 



MM 1 ! 1 1 1 



III I 



Ml 



I II 



I I 



RESULT 14 

BB846608 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

. ORGANISM 



J^FERENCK 
AUTHORS 



TITLE 

JOURNAL 
COMMENT 



Craniata; Vertebrata; Kute.leostomi / 
Sciurognathi ; Muridae ; Murinae ; Mus . 



BB846608 416- bp mRNA linear EST .2 6 -NOV- 2 0 0 1< 

BE3466.08 RIKEN full-length enriched, adult male kidney Mus mu8culus L 
cDNA clone F5 30003124 5 • , mRNA sequence. 

BB846608 ,. , . 

BB846608 . 1 GI : 17084983 

EST . " . v 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chorda t a ; 
Mammalia; Eutheria; Rodentia; 
1 (bases 1 to 416) 

Akimura,T., Arakawa,!., Carninci,P., Furuno,M w Hanagaki,T., 
Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., Imotani,K., Ishii 
, Y., Ito,M., Kawai,J., Kojima,Y., Konno,H., Kouda,M., Matsuyama, T . , 
Nakamura,M., Nishi,K., Momura,K., Numasaki,R., Okazaki,Y., Okido,T. 
, Saito,R., Sakai,C, Sakai,K. f Sakazume,N., Sasaki,D., Sato,K., 
Shibata,K., Shinagawa,A. , Shiraki,T., Sogabe,Y., Suzuki,H., Tagawa 
, A., Takahashi , F . , Takaku-Akahira , S . , Tanaka,T., Tomaru,A., Toya,T.. 

Watahiki,A., Yasunishi,A. , Muramat su , M . and Hayashizaki , Y . 
RIKEN Encyclopedia of Mouse Full-length cDNAs (Akimura,T., et al . 
2001) 

Unpublished 

Contact: Yoshihide Hayashizaki . . . 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 23 0-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome -res@gsc . riken .go . jp, 
URL : http : //genome . gsc . riken . go . jp/ 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K.,, Itoh 
,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 



Normalization and subtraction of cap- trapper-selected cDNAs to 
prepare full-length cDMA libraries for rapid discovery of new 
genes. Genome Res. . 10 (10), 1617-1630 (2000) 

wagi,K., Fuj iwake , S . , Inoue,K., Togawa,Y., Izawa,M., Ohara,E., 
Watahiki,M. , Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura 
,8., Kawai,J., Okazaki,Y., Murarnatsu, M . , Inoue,Y., Kira,A. and 
Hayashizaki , Y. 

RIKEN integrated sequence analysis (RISA) system- -3 84 -format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. . 
10 (11), 1757-1771 (2000) 

Konno,H., Fukunishi , Y . , Shibata,K., Itoh,M., Carninci,P., Sugahara 
J. and Hayashi zaki , Y . 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. . 11 (2), 281-289 (2001) 
Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details, 
e mouse tissues. 
FEATURES Location/Qualifiers 
source 1. .416 

/organism="Mus musculus" 
/mol_type= "mRNA" 
/db_xref ="taxon: 10090" 
/clone="F53 0003124" 
/sex= "male " 
/tissue__type= "kidney" 
/dev_stage= "adult " ■ 
/lab host ="S0LR" 

/clone_lib= "RIKEN full-length enriched, adult male kidney" 
/note="Site_l: Xhol; Site_2: SstI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences . Center and Genome Science Laboratory . in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5' 

GAGAGAGAGAGCGGCCGCAACTCGAGTTTTTTTTTTTTTTTTVN 3 ' ] , CDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand .cDNA was prepared with the 
primer adapter of sequence [5' 

GAGAGAGAGAAGGATCCAAGAGCTCAATTAATTAATTAAACCCCCCCCCCC 3 ' ] . 

cDNA was cleaved with Xhol and SstI. " 
BASE COUNT 107 a 93 c 87 g 129 t 

ORIGIN 

Query Match 23.0%; Score 354.2; DB 10; Length 416; 

Best Local Similarity 97.3%; Pred. No. 8.7e-63; 

Matches 392; Conservative 0; Mismatches 8; Indels 3; Gaps 3; 
Qy - 1' GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGACAGCAGAATGGCACAGAATTTATC 60 

MINI INN I MINIM Mill I MINIUM II li M II II N I II II 1 1 1 1 1 1 

Db 1C GCTCCTGGCAGAGTTTTCTGTCGAGACAGAAGCCGAAAGCAGAATGGCACAGAATTTATC 7 5 

QY S3: TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 12 0 

1 1 1 1 1 1 1 1 1 1 i II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 76 TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 135 



Qy 


121 


TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 

1 1 1 I 1 1 1 1 I 1 i 1 1 1 i 1 1 1 1 1 1 t t t 1 t l 1 t l l l i i i i t i i i i i i i i t i i i i i i i 


180 


Db 


136 


1 1 H 1 M M 1 1 II II 1 1 1 II 1 1 M II 1 1 II 1 1 1 II 1 M 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 M II 1 

TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTTGGCTACCTCTT 


195 


Qy 


181 


CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 

) 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 L t I I 1 I I 1 I | 1 1 1 1 . 1 1 lilt 


240 


Db 


196 


M M 1 1 ! 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I II II I 1 1 1 1 1 1 1 1 1 1 1 1 1 M | 

CTGCATGAAGAACTGGAAAAGCAGCAATGTCTATCTTTTTAAACTTT - CATCTCTGACTT 


254 


Qy 


241 


TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 


300 


Db 


255 


1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 M 1 1 1 1 II 1 1 1 1 1 1 II II 1 1 1 1 1 1 II 1 1 1 | | || 

TGCTTTCCTGTGCACCCTT - CCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 


313 


Qy 


301 


TGGAGATGTTCTCTGTATAAGCAACCGATATGTGCTTCACACCAACCTCTACACCAGrAT 


360 


Db 


314 


1 1 1 1 1 1 1 1 1 1 1 1 MINIM IIIIIMIIIMI MM II II Ml MM III Mill 

TGGAGATGTTCTATGTATAAGCAACCGATATGTGGTTCACAACAACCTCTAAACCAGCAT 


373 


Qy 


3-61 


CCTCTTCCTCACTTTCATTAG - C ATGGAC CG AT AT G TGCT C AT 4 02 




Db 


3 74 


MilliMIIIIIII Mill 1 IIIMIIIMMIIIIIIIII 

CCTCTTCCTG AC TTTCATTAGCC ATGGAC CGATATCTGCTC AT 415. 





RESULT 15 
BY368584 

LOCUS BY368584 408 bp mRNA linear EST 12-DSC-200? 

DEFINITION 37368584 RIKEN full-length enriched, 6 days neonate spleen Mus ' 

mus cuius c'ONA .clone F430110C01 3 ' , inRNA sequence. - 
ACCESSION BY368584 ' • _ 1 

VERSION BY 368584 . 1 .GI : 2(5.59.8072 

K^rWORirS : SST . , < 

3DURCE mi is mus cuius (house mouse) . . 

ORGANISM Mus rnusculus ( ■-. ■ 

Eukaryota ; Me tazoa ; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; B^utharia; Rodentia ; Sciurognathi ; Muridae; Mur.inae; Mus . 

REFERENCE 1 (bases 1 to 408) 

AUTHOR?! Okazaki,Y., Furuno.M., Kasukawa., T . , Adachi,J., Bono,H., Kondo,S., 

Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka,I., Kiyosawa,H. 
, Yagi,K., Toraaru,Y w Hasegawa,Y., Nogami,A., Schonbach, C . , 
Cojobori,T. , Baldarelli , R . , Hill, D. P., Bult,C, Hume, D. A., 
Quackenbush , J . , Schriml , L .M. , Kanapin,A. , Matsuda,H. , Batalov,S. , 
3eisel,K.W., Blake, J. A. , Bradt , D . , Brusic,V., Chothia,C, Corbani 
,L.E., Cousins, S., Dalla,E., Dragani , T . A. , Fletcher, C . F . - Forrest 
,A., ?razer,K.S. , Gaasterland, T . Gariboldi , M . , Giss"i,C, Godzik,A. 
, Gough,-J. , Grimmond, S . , Gustincich, S ; , Hirokawa , N . , Jackson, I .J. , 
Jarvis,E.D., Kanai,A., Kawaji,H., Kawasawa,Y., Kedzierski , R . M . , 
King,B.L., Konagaya, A. , Kurochkin, I . V. , Lee,Y., Lenhard,B., Lyons 
/P.A., Maglott ,D.R. , Maltais , L . , Marchionni , L . , McKenzie,L., Miki 
,H., Nagashima,T. , Numata,K. , Okido,T., Pavan,W.J., Pertea,G., 
Pesole,G., Petro\*sky , N . , Pillai,R., Pontius , J. U. , Qi , D . , 
Ramachandran, S. , Ravasi,T., Reed, J. C, Reed, D. J., Reid,J., Ring 
/3.Z., Ringwald,M. , Sandelin,A., Schneider , C . , Semple,C.A., Setou 
,M., Shimada,K., Sultana , R ., Takenaka , Y . , Taylor, M.S., Teasdale 
,R:D. , Tomita,M. , Verardo,R., Wagner , L . , Wahlestedt , C . , Wang,Y., 
Watanabe # Y., Wells, C, Wilming, L . G . , Wynshaw-Boris , A. , Yanagisawa 
,M., -Yang, I., Yang,L., Yuan.Z., Zavolan, M.., Zhu,Y., Zimmer,A., 
Carninci, P. r Hayatsu,N., Hirozane-Kishikawa, T . , Konno,H., Nakamura 
,M., Sakazume,N. , 3ato,K., Shiraki,T., Waki , K . , Kawai , J . , Aizawa,K. 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



EATURES 

:;ourc 



Carninci , P . , Fukuda, S . , Hirozane 
Kawai , J . , Konno , H . , Miyazaki , A . 



BASE COUNT 
ORIGIIT 



, Arakawa,T., Fukuda, S., Hara,A., Hashizume, W . , Imotani,K., Ishii 
, Y., Itoh,M., Kagawa,I., Miyazaki, A. , Sakai,K., Sasaki, p., Shibata 
,K., Shinagawa, A. , Yasunishi , A. , Yoshino,M., Waterston, R - , Lander 
, E.S., Rogers, J., Birney,E. and Hayashizaki , Y . 

Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

Nature 420, 563-573 (2002) 

22354683 

12466851 

Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 
Sciences Center (GSC), Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1.-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc . riken.go. jp, 
URL : http : / /genome . gsc . riken . go . j p/ 
Aizawa,K. , Akimura/T. , Arakawa,T. , 
,T., Imotani,K., Ishii, Y., Itoh,M. 
, Murata,M., Nakamura,M., Nomura, K. , Numazaki , R . , Ohno,M., Sakai,K. 
, Sakazume,N., Sasaki, D., Sato,K., Shibata, K. , Shiraki,T., Tagami 
,M., Waki,K., Watahiki,A., Muramatsu,M. and Hayashizaki , Y . Direct 
Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequence's, Mamm. Genome . 12, 673-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full - length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system- -3 84 -format 
.sequencing pipeline with 384 multi capillary sequencer. Genome Res. . 
10 (11) , 1757-1771 (2000) 

Computer-based methods for the mouse full --.length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA "library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN . 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 

Location/Qualifiers 
I. .408 

/organism="Mus musculus" 
/mol_type= " mRNA " 
/strain= "C5 7BL/6 J" 
/db__xref ="taxon: 10090" 
/clone= " F4 3 0 110CO 1 " 
/tissue_type=" spleen" 
/dev_stage= "6 days neonate" 

/clone_lib="RIKEN full-length enriched, 5 days neonate 
spleen" 

145 a 58 c 63 g 141 t 1 others 



■Query Match 



22.7%; Score 350.6; DB 13; Length 408; 



Best Local Similarity 98.0%; Pred. No. 4.8e-62; 

Matches 386; Conservative 0; Mismatches 5; I ridels 3; Gaps 3; 



Qy 


1153 


TCAGAAGGCAGCTCTCTGTTCTGATTTTAGGTTATACCCAGAGTATGGAAAAAATAAGGC 
i t i i i i r i i i i i i i r i i i i i i i i i t i i i i i i i i i i i i i i i i i i i i i i i t i « i t 


1212 


Db 


1 


1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I IIINIIIIIII 

TCAGAAGCCAGCTCTCTGTTCTGATTTTAGGTTATACGCAGAGTATGNAAAAAATAAGGC 


60 


Qy 


1213 


ATGAGAAAGCA- TTGACATCTTCACTTAAGAACTGAACAAAAGAGAACAAATA- TTGTCA 


1270 


Db 


61 


M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 II II 1 1 1 1 M 1 1 1 1 1 II III 

ATGAGAAAGCAGTTGACATCTTCACTTAAGAACTGAACAAAAGAGAACAAATAGTTCTCA 


12 0 


Qy 


1271 


ATGTTTGGACACTTAGGATCTGAAATCTTGGAAATTTTAAGACCTCTTTTTCTATCAGTG 


1330 


Db 


. 121 


M 1 1 1 1 II 1 1 1 1 M 1 M 1 1 1 1 1 II 1 1 M 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 
ATGTTTGGACACTTAGGATCTGAAATCTTGGAAATTTTAAGACCTCTTTTTCTATCAGTG 


180 


Qy 


1331 


TAAAAGGAATACAAGATAG - CTAGTTGCAAATGCTGAATGCATTTCATCATTGGTCAGGT 


1389 


Db 


181 


1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Mllll II 1 II 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

TAAAAGGAATACAAGATAGCCTAGTTCCAAATGCTGAATGCATTTCATCATTGGTCAGGT 


240 


Qy 


1390 


CGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTGTAATATTAAAATTTATGTGA 


1 A. A Q 

-L J7 


Db 


241 


1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 

CGATAAGCGTGTTTCTGAAATAGTCTTATTTTTATTCTTGTAATATTAAAATTTATGTGA 


300 


Qy 


14 5,0 


AAAATGAATATAATTCAATGTACAACATTAGATTTTCTATTTGAAAATTATATTTCTTGA 


1509 




::oi 


Mill IIINIIIIIII II 1 1 IN II II N II N II II 1 N N 1 II II N N 1 II 1 II 1 

7AAAATGAATATAATTCAATTTACAACATTAGATTTTCTATTTGAAAATTATATTTCTTGA 


360. 




Kin 


AAAAATAACTGCTGTGCCTAAATAAATCAATATA 154 3 




Db 


"CI 


1 1 1 1 i 1 1 M i 1 ! 1 1 1 1 M M 1 1 1 M 1 1 1 1 1 1 1 1 1 

AAAAATAACTGCTGTGCCTAAATAAATCAATATA 3 94 
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OM nucleic - nucleic search, using sw model 
Run on: 



Title: 
Perfect score: 1543 



December 14, 2003, 10:09:19 ; Search time 450 Seconds 

(without alignments) 
9256.073 Million cell updates/sec 

US-09-891-138A-1 



Sequence : 



1 gctcctggcagagttttctg tgcctaaataaatcaatata 1543 



Scoring table: IDENTITYJJUC 

Gapop 10.0 , Gapext 1.0 



Searched: 



2552756 seqs, 1349719017 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



5105512 



Post-processing : 



Minimum Match 0% 
Maximum Match 100% 
Listing first 45 summaries 



Database : 



N_Geneseq_19 Jun03 : * 

1 : / SIDS1/ gcgdata/geneseq/genese 

2: /SIDSl/gcgdata/geneseq/genese 

3: /SIDSl/gcgdata/geneseq/genese 

4: /SIDSl/gcgdata/geneseq/genese 

5 : /SIDS1/ gcgdata/ geneseq/ genese 

6: /SIDSl/gcgdata/geneseq/genese 

7: /SIDSl/gcgdata/geneseq/genese 

8: /SIDSl/gcgdata/geneseq/genese 

9: /SIDSl/gcgdata/geneseq/genese 

10: /SIDSl/gcgdata/geneseq/genes 

11: /SIDSl/gcgdata/geneseq/genes 

12: /SIDSl/gcgdata/geneseq/genes 

13: /SIDSl/gcgdata/geneseq/genes 

14: /SIDSl/gcgdata/geneseq/genes 

15: /SIDSl/gcgdata/geneseq/genes 

16: /SIDSl/gcgdata/geneseq/genes 

17: /SIDSl/gcgdata/geneseq/genes 

18: /SIDSl/gcgdata/geneseq/genes 

19: /SIDSl/gcgdata/geneseq/genes 

20: /SIDSl/gcgdata/geneseq/genes 

21: /SIDSl/gcgdata/geneseq/genes 

22: /SIDSl/gcgdata/geneseq/genes 

23: /SIDSl/gcgdata/geneseq/genes 

24: /SIDSl/gcgdata/geneseq/genes 

25: /SIDSl/gcgdata/geneseq/genes 



:qn-embl/NA198 0. DAT: * 
; qn-embl/NA1981 . DAT : * 
;qn-embl/NA1982 . DAT : * 
^qn-embl/NA198 3 . DAT : * 
:qn-embl/NA198 4 . DAT : *■ 
iqn-embl/NAl 985. DAT : * 
^qn-embl/NAigse . DAT : * 
:qn-embl/NA1987 . DAT : * 
:qn-embl/NA198 8 . DAT : * 
eqn-embl/NAl989 . DAT: ' 
eqn-embl/NA1990 . DAT: * 
eqn-embl/NA1991 . DAT : * 
eqn-embl/NAl992 . DAT: 1 
eqn-embl/NA1993 . DAT: ^ 
,eqn-embl/NA1994 . DAT : 
ieqn-embl/NA1995 . DAT: 
eqn-embl/NA1996.DAT: - 
eqn-embl/NA1997 . DAT : * 
eqn-embl/NA1998 . DAT : * 
eqn-embl/NA1999 . DAT : * 
eqn-embl/NA2000 . DAT : * 
eqn-embl/NA2001A.DAT: * 
eqn-embl/NA2001B.DAT: * 
eqn-embl/NA2002 . DAT : * 
eqn-embl/NA2003 . DAT : * 
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Pred. No. is the number of results predicted by chance to have a 



score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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purinergic r 
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8. 
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8, 


.2 


1014 


24 
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18 


126. 


,6 


8. 


,2 
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25 


ABZ59170 


Human 
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19 


126. 
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8. 


.2 


1014 


25 


ABZ42582 
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G protein-co 


20 
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8, 


,2 
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21 


126. 


,6 


8. 


.2 
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ABS59232 
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22 
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8. 


.2 


1179 


25 
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Human 
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23 
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,6 


8. 


,2 


1288 


24 


ABL56197 


Human 
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24 


126. 
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8, 


.2 


1729 


22 


AAS08362 
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25 
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. 6 


8. 


.2 
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23 
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Human 


prostate exp 


26 
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,6 


8. 


.2 
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23 


ABV257 67 


Human 


prostate exp 


27 
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. 6 


8. 


.2 


1729 


23 


ABV29909 


Human 


prostate exp 


28 


126. 


. 6 


8, 


.2 


1729 


23 


ABV3002 4 


Human 


prostate exp 


29 


126. 


,6 


8. 


.2 


1797 


25 


AAD50882 


Human 


TARZAN cDNA. 


30 


126, 


. 6 


8. 


.2 


5435 


24 


ABL56198 


Human 


P2Yl-li enco 


31 


126. 


. 6 


8. 


.2 


9905 


24 


AAK98324 


Human 


purinergic-r 


32 


125 


8, 


.1 


1014 


24 


ABQ78847 


Human 


G-protein co 


33 


125 


8. 


. 1 


1014 


24 


AAD34278 


Human 


AXOR8 9 (G-pr 


34 


125 


8, 


.1 


1014 


24 


AAD26370 


Human 


G-protein co 


35 


125 


8, 


.1 


1560 


24 


ABS51730 


Human 


novel polynu 


36 


125 


8, 


.1 


1851 


24 


ABS51678 


DNA encoding human 


37 


119. 


.2 


7, 


.7 


1020 


22 


AAH51011 


Human 


nGPCR54 codi 


38 


119, 


,2 


7, 


.7 


1020 


24 


ABS70244 


DNA encoding human 


39 


119 


7, 


.7 


1313 


22 


AAK52430 


Human 


polynucleoti 


40 


112 . 


. 6 


7, 


.3 


740 


23 


ABV15662 


Human 


prostate exp 


41 


109 


7, 


.1 


1014 


25 


AAD50884 


Mouse 


TARZAN cDNA. 


42 


109 


7 . 


.1 


7399 


25 


AAD50886 


Mouse 


TARZAN genom 


43 


104 


6, 


.7 


1020 


24 


ABQ79300 


Human 


GPCR designa 


44 


104 


6, 


.7 


1076 


24 


AAD29667 


Human 


G-protein co 


45 


103, 


.2 


6, 


.7 


6721 


24 


AAS18600 


Purinergic recepto 



ALIGNMENTS 



RESULT 1 
ABK12957 

ID ABK12957 standard; DNA; 1543 BP. 
XX 

AC ABK12957; 
XX 

DT 09-APR-2002 (first entry) 
XX 

DE DNA sequence of mouse G-protein coupled receptor TGR18 gene. 
XX 

KW Mouse; G-protein coupled; receptor; GPCR; TGR18; kidney disease; 

KW signal transduction modulator; cerebral cavernous malformation; 

KW hyperlipidemia; obesity; dyslexia; cardiac myxoma; renal failure; 

KW nephritis; hypertension; liver disease; cirrhosis; blood disorder; 

KW spleen-associated disorder; immune disorder; gene; ds . 
XX 

OS Mus sp. 
XX 

FH Key Location/Qualifiers 

FT CDS 44.. 997 

FT /*tag= a 

FT /product^ "Mouse G-protein coupled receptor TGR18" 
XX 

PN WO200200719-A2. 
XX 

PD 03-JAN-2002 . 
XX 

PF 25-JUN-2001; 200 1WO-US2 0363 . 
XX 

PR 23-JUN-2000; 200 OUS-2 134 61P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Lin DC, Zhao J, Chen J, Cutler G; 
XX 

DR WPI; 2002-147880/19. 

DR P-PSDB; AAU74904. 
XX 

PT New G-protein coupled receptor polypeptides, useful for identifying 

PT modulators of signal transduction for treating kidney disease, 

PT hyperlipidemia, obesity, dyslexia and cardiac myxoma 
XX 

PS Claim 18; Page 58; 78pp; English. 
XX 

CC The present invention relates to a new G-protein coupled receptor (GPCR) 

CC polypeptide comprising greater than 70% amino acid sequence identity to 

CC the amino acid sequence of human GPCRs TGR62, TGR21, TGR130.1, TGR130.2, 

CC human TGR213 or TGR92, 80% amino acid sequence identity to mouse TGR18 

CC or 90% amino acid sequence identity to human novel edg receptor protein, 

CC as defined in the specification. The GPCR covalently linked to a solid 

CC phase is useful for identifying a compound that modulates signal 

CC transduction. The identified compounds are useful for treating 

CC kidney disease, cerebral cavernous malformations, hyperlipidemia, 

CC obesity, dyslexia and cardiac myxoma. The molecules of the invention are 

CC useful for diagnosing disorders or conditions such as kidney-related 

CC conditions or diseases such as renal failure, nephritis, nephrotic 



CC syndrome, asymptomatic urinary abnormalities, renal tubule defects, 

CC hypertension and nephrolithiasis, liver-related disease or condition 

CC e.g. cirrhosis, infiltrations, lesions, functional disorders and jaundice 

CC and spleen-associated disorders or conditions e.g. splenic enlargement, 

CC immune disorders, blood disorders and others. Modulation of the 

CC polypeptide of the invention is useful to treat or prevent any of the 

CC above conditions or diseases. The present nucleic acid sequence encodes 

CC the mouse GPCR TGR18 protein of the invention. This sequence encodes one 

CC of seven novel G protein coupled receptors of the invention (ABK12957- 

CC ABK12964) . 

XX 

SQ Sequence 1543 BP; 438 A; 352 C; 293 G; 460 T; 0 other; 

Query Match 100.0%; Score 1543; DB 24; Length 1543; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 1543; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 GCT CCT GGCAGAGTTTT CT GTCGAGACAGAAGCCGACAGCAGAAT GGCACAGAATTTAT C 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 GCT C CT GGC AGAGT TTT CT GT C GAGAC AGAAGC C GAC AGC AGAAT GGCACAGAATTTAT C 60 

Qy 61 T T GT GAGAAT T GGT T G GCAAC AGAGG CT AT CTT GAATAAGT ACT AC CT CT CT GCATT T TA 120 

I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 61 TTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTTA 120 

Qy 121 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I III I I I I I I I I I 
Db 121 TGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCTT 180 

Qy 181 CTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACTT 240 

I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I II 

Db 181 CT GC AT GAAGAACT G GAAC AGCAG CAAT GT CTAT CT TT T T AAC CTT T C C AT CT CT GACT T 240 

Qy 241 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 300 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 241 TGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAATGATAAGGGGACCTA 300 

Qy 301 T GGAGAT GT T CT CT GT ATAAGCAAC C GAT AT GT GCT T CAC AC CAAC C T CT ACAC CAGC AT 360 

I I I I I I I I I I II I I II I I I I II II I I I I I I I I I I I I I I I I I I I I I II II I I I II I I I I I I 
Db 301 T GGAGAT GT T CT CT GT ATAAGCAAC C GAT AT GT GCT T CAC AC CAAC CT CT AC AC CAGC AT 360 

Qy 361 CCTCTTCCT CACT T T CAT T AGCAT G GAC C GAT AT CTGCT CAT GAAGT AC C CT T T C C GAGA 420 

I I I I I I I I I I I I I II I I I I I II I II I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 361 CCTCTTCCT C ACT T T CAT T AGCAT G GAC C GAT AT CTGCT CAT GAAGT AC C CT TT C C GAGA 420 

Qy 421 ACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 4 80 

I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I II I I I I I I II I I I I I I II I I I I I I I I I I 

Db 421 ACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCTTAGT 4 80 

Qy 481 GAC C T T AGAAGT T CT AC C CAT GCT CACT TT CAT CAAT T CT GT C C CAAAAGAAGAGGGC AG 540 

I I I I I I I I II I II I I I I I I II I I I I I I I I I I I I I I I II I I I I II I I I I I I M I II I I I I I 
Db 481 GAC C T T AGAAGTT C T AC C CAT GCT CACT T T CAT CAAT T CT GT C C CAAAAGAAGAGGGC AG 540 

Qy 541 T AACT GC AT C GACT AT GCAAGT T CT G GAAAC C CTGAACACAAT C T C ATT T ACAGCCT CT G 600 

I I I I I I M I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 541 T AACT GC AT C GACT AT G CAAGT T CT G GAAAC C CT GAACACAAT CT CATT T AC AG C CT CT G 600 



Qy 601 CCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGAT 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I 

Db 601. CCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACAAGAT 660 

Qy 661 GGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACC 72 0 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 661 GGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACTGCCCTGCCACTGGACAAACC 720 

Qy 721 CCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATAT 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 

Db 721 CCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATCATAT 780 

Qy 781 C AT GC G CAAT T T GAGGAT C GC CT C AC GC CTGGAT AGT T GGC CACAAGGAT GT AC AC AGAA 840 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II II I 
Db 781 C AT GC G CAAT T T GAGGAT C GC CT C AC GC CT GGAT AGT T GGC CACAAGGAT GT AC AC AGAA 84 0 

Qy 841 GGCCATC7WVTCTATATACACACTGACACGGCCTCTGGCCTTTCTGAACAGTGCCATCAA 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 841 GGCCATCAAATCTATATACACACTGACACGGCCTCTGGCCTTTCTGAACAGTGCCATCAA 900 

Qy 901 T C C CAT CT T CT ACTT C CT CAT GGGAGAC CATT AC AGAGAGAT GCT GAT T AGTAAGT T C AG 960 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I M I I I I I I I I I I I I I I I I I I I I I I 
Db 901 T C C C AT CT T CT ACTT C CT CAT GGGAGAC CATTAC AGAGAGAT GCT GAT TAGT AAGT T CAG 960 

Qy 961 ACAATACTTCAAGTCCCTTACATCCTTCAGGACATGAGCTGCTGGATGCAGGTCTTCACT 1020 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 

Db 961 ACAATACTTCAAGTCCCTTACATCCTTCAGGACATGAGCTGCTGGATGCAGGTCTTCACT 1020 

Qy 1021 C AGC CAAAAT GAGACACT T GATAAAC AGT GCT GT GCAGT T GAGT T TTAACTAAGT AAAC C 1080 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I 
Db 1021 C AGC CAAAAT GAGACACT T GATAAAC AGT GCT GT GCAGT T GAGT T T TAACTAAGTAAAC C 1080 

Qy 1081 ACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTAC7^AGCTG 1140 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1081 ACCATTTCTAGGCTTTAGCTTTCCACCATCCTCCAACCCCCAGGGCTGGAGTACAAGCTG 1140 

Qy 1141 GGT C C AC AT GAAT CAGAAGGC AGCT CT CT GTT CT GAT T T TAG GT TAT AC C CAGAGT AT GG 1200 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I i I I I I I I I I I I I I I I I 
Db 1141 GGT C C AC AT GAAT CAGAAG GC AGCT CT CT GTT CT GAT T T T AGGT TAT AC C CAGAGT AT GG 1200 

Qy 1201 AAAAAATAAGGCAT GAGAAAGCAT T GACAT CTT C ACT T AAGAACT GAACAAAAGAGAAC A 1260 

I I I I I I I I I I II I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I 
Db 1201 AAAAAATAAGGCAT GAGAAAGCAT T GACAT CTT C ACT TAAGAACT GAACAAAAGAGAAC A 1260 

Qy 1261 AAT AT T GT CAAT GT T T GGAC ACT T AG GAT CTGAAAT CTT G GAAAT TT T AAGAC CT CT TTT 1320 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 1261 AAT ATT GT CAAT GT T T GGAC ACT TAG GAT CTGAAAT CTT G GAAAT T T T AAGAC CT CT T TT 1320 

Qy 1321 T CT AT CAGT GTAAAAG GAAT ACAAGAT AGCTAGT T GCAAAT GCT GAAT GCAT T T CAT CAT 1380 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 1321 T CTAT CAGT GTAAAAGGAAT ACAAGAT AGCTAGTTGCAAATGCT GAAT GCATTT CAT CAT 1380 

Qy 1381 T GGT CAGGT C GATAAGC GT GT T T C T GAAAT AGT CTT AT TTT TAT T CT T GT AAT AT T AAAA 1440 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1381 T GGT CAGGT C GATAAGC GT GT T T CT GAAAT AGT CTTAT TTT T ATT CTT GTAAT AT TAAAA 1440 

Qy 1441 T T TAT GT GAAAAAT GAAT AT AAT T CAAT GTACAACAT T AGAT TT T CTAT T T GAAAAT TAT 1500 



1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 1441 TTTATGTGAAAAAT GAATATAATT CAAT GTACAACATTAGATTTT CTATTT GAAAATTAT 1500 

Qy 1501 AT T T CT T GAAAAAATAACT G CT GT GC CTAAATAAAT CAAT AT A 1543 

I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 1501 ATT T CTT GAAAAAATAACT GCT GT GCCTAAATAAATCAATATA 1543 



RESULT 2 
AAD01135 
ID 
XX 



AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
XX 
OS 
XX 
FH 
FT 
FT 
FT 
FT 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 



AAD01135 standard; cDNA; 1005 BP. 
AAD01135; 

02-NOV-2000 (first entry) 

Human orphan G protein-coupled receptor hCHNIO cDNA. 



Human; orphan 
transmembrane 

Homo sapiens . 

Key 
CDS 



WO200031258-A2. 



02-JUN-2000. 

13-OCT-1999; 

20-NOV-1998 
16-FEB-1999 
26-FEB-1999 
12-MAR-1999 
12-MAR-1999 
28-MAY-1999 
28-MAY-1999 
28-MAY-1999 
28-MAY-1999 
28-MAY-1999 

28- MAY-1999 

29- JUN-1999 
29-SEP-1999 
29-SEP-1999 
29-SEP-1999 
29-SEP-1999 
01-OCT-1999 
01-OCT-1999 
01-OCT-1999 
01-OCT-1999 
01-OCT-1999 
12-OCT-1999 



G protein-coupled receptor; GPCR; 
receptor; expressed sequence tag; 



hCHNIO; drug screening; 
EST; signal cascade; ss . 



Location/Qualifiers 
1. .1005 
/*tag= a 

/product= "hCHNIO" 

/note- "Human orphan G protein-coupled receptor* 



99WO-US23687 . 

98US-0109213. 
99US-0120416. 
99US-0121852. 
99US-0123946. 
99US-0123949. 
99US-0136436. 
99US-0136437. 
99US-0136439. 
99US-0136567. 
99US-0137127. 
99US-0137131. 
99US-0141448. 
99US-0156555. 
99US-0156633. 
99US-0156634. 
99US-0156653. 
99US-0157280. 
99US-0157281. 
99US-0157282. 
99US-0157293. 
99US-0157294 . 
99US-0416760. 



PR 12-OCT-1999; 99US- 04 17 044 . 
XX 

PA (AREN-) ARENA PHARM INC. 
XX 

PI Chen R, Dang HT, Liaw CW, Lin I; 
XX 

DR WPI; 2000-400068/34. 

DR P-PSDB; AAY71308. 
XX 

PT Novel human orphan G protein-coupled receptors and the encoding cDNAs 

PT for use in the identification of G protein-coupled receptor agonists - 
XX 

PS Claim 69; Page 86; 102pp; English. 
XX 

CC The present sequence is a cDNA encoding hCHNIO, an endogenous human 

CC orphan G protein-coupled receptor (GPCR) , expressed in kidney and 

CC thyroid. The hCHNIO cDNA was identified using the human EST (expressed 

CC sequence tag) 1365839 as a probe. The orphan GPCR of the invention, like 

CC all GPCRs has seven transmembrane alpha helices with an extracellular 

CC N-terminus and an intracellular C-terminus . However, no endogenous 

CC ligands has yet been identified for the proteins of the invention. The 

CC orphan GPCRs may be used in the identification of their endogenous 

CC ligands, and to screen potential GPCR agonists and antagonists for use as 

CC pharmaceutical agents. The proteins may also be used in the study of 

CC GPCR-mediated signalling cascades, and to elucidate their precise role in 

CC normal and diseased human conditions. Nucleic acid encoding human orphan 

CC GPCRs may be used for tissue localisation expression analysis to provide 

CC information about their function in healthy and pathological states. 

XX 

SQ Sequence 1005 BP; 248 A; 236 C; 196 G; 325 T; 0 other; 

Query Match 38.4%; Score 592.4; DB 21; Length 1005; 

Best Local Similarity 75.5%; Pred. No. 3.2e-140; 

Matches 750; Conservative 0; Mismatches 241; Indels 3; Gaps 1; 

Qy 39 GCAGAAT GGCACAGAATTT ATCTT GTGAGAATT GGTT GGCAACAGAGGCT AT CTT GAAT A 98 

II I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I 
Db 8 GGAT CAT GGCAT GGAATGCAACTT GCAAAAACT GGCT GGCAGCAGAGGCT GCC CT GGAAA 67 

Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 68 AGT ACT AC CT TT C C ATT T T T TAT GGGAT T GAGT T C GT T GT GGGAGT C CT T GGAAAT AC C A 127 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I 
Db 128 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCT 187 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 278 

I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I II II I I I I I I I I I I I I I I I 
Db 188 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 247 

Qy 279 AT GC CAAT GAT AAGGGGAC CT AT GGAGAT GTT CT CT GTAT AAGCAAC C GAT AT GT GCT T C 338 

I I I I I I I I I II III I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 248 AT GC C AAT GGAAACT GGAT AT AT G GAGAC GT GCT CT GCAT AAGCAAC C GAT AT GT G CT T C 307 



Qy 



339 AC AC CAAC CT CT AC ACC AG CAT CCTCTTCCT CACTTT CAT T AGCAT GGAC C GAT AT CT GC 398 
I I I II I I I I I I I I I I I I I I I I I I I MINIM II | | | I I II I I I I I II 



Db 



308 AT G C CAACCT CT AT AC CAGC AT TCTCTTTCT C AC TT T T ATC AGCAT AGAT C GAT ACT T GA 367 



Qy 399 T CAT GAAGT AC C CTT T C C GAGAAC ACTT T CT AC AAAAGAAGGAAT TT GC CATTTTAAT CT 458 

I II I I I II I I I I I I I I I I I I I M I I I I I I I I 1 I I I II I I I II I I I I I I I I I I 
Db 368 TAAT T AAGT AT C CTTT C C GAGAAC AC CT T CT GCAAAAGAAAGAGTT T GCT AT TTT AAT CT 427 

Qy 459 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 428 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 4 87 

Qy 519 CT GT C C CAAAAGAAGAGG GC AGT AACT G CAT C GACT AT GCAAGT T CT GGAAAC C CT GAAC 57 8 

I I I I II II I I I I I I I I I I I I I I I I I i I I I I I I M I I I I I 
Db 4 88 CT GT TAT AACT GACAAT GGC AC CACCT GT AAT GAT TTT GCAAGT T CT GGAGAC C CCAACT 547 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

M I I I I I I I I I I I I I I I II I I II I I I I M I I I I IN I I I 

Db 54 8 AC7\ACCTCATTTACAGCATGTGTCT7VACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 607 

Qy 639 T GT GCTT CTTCTACT ACAAGAT GGT AGT CTT CTT AAAGAGGAGGAGC CAGC AGCAAGCAA 69 8 

I I I I I I I I I I I I I I I II I I I I I M I I I I I I I I I I Ml 

Db 608 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 667 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 668 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 727 

Qy 759 TACT CTT C ACAC C C TAT CAT AT CAT GC GCAAT T T GAGGAT CGC CT CAC GC CT G GAT AGT T 818 

I II II I I I I I I I I I I I I I I I I II III I I I I II I I I I I I I I I I I I I I I I I I 
Db 72 8 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 7 87 

Qy 819 G GC CACAAG GAT GT ACAC AGAAGGC CAT CAAAT C TAT AT ACAC ACT GAC ACGGC CT C 875 

I || | II II Mi I I I I I I I I I I I I I I I I I I I I I I I II 

Db 78 8 GGAAGCAGTATCAGTGCACTCAGGTCGTCATC7VACTCCTTTTACATTGTGACACGGCCTT 84 7 

Qy 876 TGGCCTTT CT GAAC AGT GC CAT CAAT CC C AT CTT CT ACT T CCT CAT G G GAGAC C ATT ACA 935 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I M M I I M I I I II I II 

Db 848 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 907 

Qy 936 GAGAGATGCTGATTAGTAAGTT CAGACAATACTT CAAGT CCCTTACATCCTT CAGGACAT 995 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 908 GGGACAT GCT GAT GAAT CAACT GAGACACAACTT CAAAT CCCTTACAT CCTTTAGCAGAT 967 

Qy 996 GAGCT GCT G GAT GCAGGTCTTCACT CAGC CAAAA 102 9 

I I I I III I I I I I I I I I I II I 

Db 968 G G GCT CAT GAACT C CTACT T T CAT T CAGAGAAAA 10 01 



RESULT 3 
AAA46036 

ID AAA46036 standard; cDNA; 1005 BP. 
XX 

AC AAA46036; 
XX 

DT 22-AUG-2000 (first entry) 
XX 

DE Human G protein coupled receptor hCHNIO encoding cDNA SEQ ID NO: 37. 
XX 



KW 


Human; G protein coupled receptor; GPCR; transmembrane receptor; 


KW 


identification 


; agonist; screening; therapeutic; pharmaceutical; 


KW 


mutant; ss. 






XX 








OS 


Homo sapiens . 




XX 








PN 


WO200022131- 


-A2 


• 


XX 








PD 


20-APR-2000, 






XX 








PF 


13-OCT-1999; 


99WO-US24065. 


XX 








PR 


13-OCT-1998, 




98US-0170496. 


PR 


12-NOV-1998, 




98US-0108029. 


PR 


20-NOV-1998, 




98US-0109213. 


PR 


27-NOV-1998, 




98US-0110060. 


PR 


16-FEB-1999', 




99US-0120416. 


PR 


26-FEB-1999, 




99US-0121852. 


PR 


12-MAR-1999, 




99US-0123944 . 


PR 


12-MAR-1999, 




99US-0123945. 


PR 


12-MAR-1999, 




99US-0123946. 


PR 


12-MAR-1999, 




99US-0123948 . 


PR 


12-MAR-1999, 




99US-0123949. 


PR 


12-MAR-1999, 




99US-0123951. 


PR 


28-MAY-1999, 




99US-0136436. 


PR 


28-MAY-1999, 




99US-0136437. 


PR 


28-MAY-1999, 




99US-0136439. 


PR 


28-MAY-1999, 




99US-0137127. 


PR 


28-MAY-1999, 




99US-0137131. 


PR 


28-MAY-1999, 




99US-0137567. 


PR 


30-JUN-1999, 




99US-0141448. 


PR 


27-AUG-1999, 




99US-0151114 . 


PR 


03-SEP-1999, 




99US-0152524 . 


PR 


29-SEP-1999, 




99US-0156633. 


PR 


29-SEP-1999, 




99US-0156555. 


PR 


29-SEP-1999 




99US-0156634. 


XX 








PA 


(AREN-) ARENA 


PHARM INC. 


XX 








PI 


Behan DP, Lehmann-Bruinsma K, Chalmers DT, Chen R, Dang HT; 


PI 


Gore M f Liaw 


CW, Lin I, Lowitz K, White C; 


XX 








DR 


WPI; 2000-317986/27. 


DR 


P-PSDB; AAB028 


42. 


XX 








PT 


Non-endogenous 


, human G protein-coupled receptors for screening 


PT 


receptor, inverse or partial agonists useful as therapeutic agents 


XX 








PS 


Example 1; Page 116; 187pp; English. 


XX 








CC 


The present 


invention describes transmembrane receptors, preferably 


CC 


human G protein coupled receptors (GPCR), for which the endogenous 


CC 


ligand is unknown (orphan GPCR receptors) . More specifically the present 


CC 


invention relates to non-endogenous, constitutively activated versions 


CC 


of a human GPCR. These non-endogenous human GPCRs can be useful for 


CC 


the direct identification of candidate compounds as receptors agonists, 


CC 


inverse agonists or partial agonists for use as pharmaceutical agents. 



CC AAA46017 to AAA46126 and AAB02825 to AAB02859 represent sequences used in 

CC the exemplification of the present invention. 

XX 

SQ Sequence 1005 BP; 248 A; 236 C; 196 G; 325 T; 0 other; 

Query Match 38.4%; Score 592.4; DB 21; Length 1005; 

Best Local Similarity 75.5%; Pred. No. 3.2e-140; 

Matches 7 50; Conservative 0; Mismatches 241; Indels 3; Gaps 1; 

Qy 3 9 GCAGAATGGCACAGAATTTAT CTT GT GAGAATT GGTTGGCAACAGAGGCTAT CT T GAATA 98 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 8 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 67 

Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

I I I I I I I II I I I I I I M I I II I I I I I I I I I I I I I I I I I I I I I II 

Db 68 AGT ACT AC CT T T C C ATT T T TT AT GGGATT GAGT T C GT T GTGGGAGT C CT T GGAAATAC C A 127 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I 

Db 12 8 T T GTT GT T T AC GGCT ACAT CTT CT C T CT GAAGAACT GGAAC AGC AGTAAT AT T TAT CT CT 187 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATA7VAGAGTT 278 

II I I I I I II I II I II I I I I I I I I I I I I I I II I I I I II I I I II I I I I I I I I I I 

Db 18 8 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 247 

Qy 27 9 AT GCC AAT GAT AAGGGGAC CT AT G GAGAT GT T CT CT GT AT AAG CAAC C GAT AT GT GCT T C 338 

I I I I I I I I I II III I I I I ! I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I 

Db 24 8 AT GCCAAT G GAAACT GGAT AT AT GGAGACGT GCT CT GCATAAGCAAC C GAT AT GT GCT T C 307 

Qy 339 ACACCAACCTCTACACCAGCATCCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGC 398 

I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II 

Db 308 ATGCCAACCTCTATACCAGCATTCTCTTTCTCACTTTTATCAGCATAGATCGATACTTGA 367 

Qy 399 T CAT GAAGT AC C CT TT C C GAGAACACT TTCT ACAAAAGAAGGAAT T T GC CAT T T TAAT CT 458 

I II I I I I I I I II I I I I I I I I I M I I I I II I I I I I I II I I I M III I I I I I II 
Db 368 TAATTAAGT AT CC T TT C C GAGAAC AC CTTCT GCAAAAGAAAGAGT T T GCT AT T T TAAT C T 427 

Qy 4 59 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I I I I I 1 I I I I I M I I I I I I I I I I I I I I I I I I 
Db 42 8 C CT TGGC CAT T T GGGTT T T AGT AAC CT TAGAGT T ACT AC CCAT ACT T C C C CTT AT AAAT C 4 87 

Qy 519 CT GT CC CAAAAGAAGAGGGC AGT AACT GC AT C GACT AT GCAAGT T CT GGAAACC CT GAAC 578 

I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 488 CT GTT AT AACT GACAAT GGCACCACCT GTAAT GATTTT GCAAGT T CT GGAGACC CCAACT 547 

Qy 57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 54 8 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 607 

Qy 639 T GT GCT T CTT CT ACT ACAAGAT GGT AGT CT T CT T AAAGAGGAGGAGC C AGCAGCAAGCAA 698 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I Mill I I I I III 

Db 608 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 667 



Qy 

Db 



699 
668 



758 
727 



Qy 759 TACT CT T CAC AC C CT AT CAT AT CAT GC GCAAT T T GAGGAT C GC C T CACG C CT G GAT AGTT 818 

I II II I I I I I I I I I II I I I I I I I III III [ T I I I I I I I I I I I I I I 

Db 728 T GCT T T T T ACAC C C T AT CAC GT CAT GC GGAAT GT GAGGAT C GCT T CACGC CT GGGGAGTT 7 87 

Qy 819 G GC C AC AAGGAT GT ACAC AGAAGGC CAT CAAAT CT AT AT AC AC ACT GACAC GGC CT C 875 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 788 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 847 

Qy 876 T GGC C TT T CT GAAC AGT GC C AT CAAT C C CAT CT T CT AC T T C CT CAT GGGAGACC AT T ACA 935 

I I I I I I I II I I I I I I I II I I I I I I M I I I I I I I II II I I I I I I I II I II 
Db 848 T GGC CT T T C T GAACAGT GT CAT CAACC CT GT CT T CT AT TTTCTTTT GGGAGAT C ACT T CA 907 

Qy 936 GAGAGAT GCT GAT T AGT AAGTT CAGACAATACT T CAAGT C C CT T ACAT C CT T CAGGAC AT 995 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 908 GGGACAT GCT GAT GAAT CAACT GAGACACAACTT CAAAT CCCTTACATC CTTTAGCAGAT 967 

Qy 996 GAGCTGCTGGATGCAGGTCTTCACTCAGCCAAAA 102 9 

I I I I III I I I I I I I I I Ml 

Db 968 GGGCT CAT GAAC T C CT ACTTT CAT T C AGAGAAAA 1001 



RESULT 4 


ABZ42542 


ID 


ABZ42542 standard; DNA; 1380 BP. 


XX 




AC 


ABZ42542; 


XX 




DT 


04-MAR-2003 (first entry) 


XX 




DE 


Human purinergic receptor P2U2 nucleotide SEQ ID NO: 566. 


XX 




KW 


G protein-coupled receptor; GPCR; antigenic peptide; gene therapy; 


KW 


G protein-coupled receptor modulator; antibody; immune- related disease; 


KW 


growth-related disease; cell regeneration-related disease; AIDS; cancer; 


KW 


immunological-related cell proliferative disease; autoimmune disease; 


KW 


Alzheimer's disease; atherosclerosis; infection; osteoarthritis; allergy; 


KW 


osteoporosis; cardiomyopathy; inflammation; Crohn's disease; diabetes; 


KW 


graft versus host disease; Parkinson's disease; multiple sclerosis; pain; 


KW 


psoriasis; anxiety; depression; schizophrenia; dementia; memory loss; 


KW 


mental retardation; epilepsy; asthma; tuberculosis; obesity; nausea; 


KW 


hypertension; hypotension; renal disorder; rheumatoid arthritis; trauma; 


KW 


ulcer; gene; ds . 


XX 




OS 


Homo sapiens. 


XX 




PN 


WO200261087-A2. 


XX 




PD 


08-AUG-2002. 


XX 




PF 


19-DEC-2001; 2001WO-US50107 . 


XX 




PR 


19-DEC-2000; 2000US-257 144P . 


XX 




PA 


(LIFE- ) LIFESPAN BIOSCIENCES INC. 


XX 




PI 


Burmer GC, Roush CL, Brown JP; 


XX 





DR WPI; 2003-046718/04. 

DR P-PSDB; ABP81696. 
XX 

PT New isolated antigenic peptides e.g., for G protein-coupled receptors 

PT (GPCR) , useful for diagnosing and designing drugs for treating 

PT conditions in which GPCRs are involved, e.g. AIDS, Alzheimer's disease, 

PT cancer or autoimmune diseases 

XX 

PS Disclosure; Fig 1; 523pp; English. 
XX 

CC The present invention describes antigenic peptides (I) comprising: 

CC (a) any one of 1601 sequences {see ABP82019 to ABP83619) of 12-24 amino 

CC acids. Also described: (1) an assay for the detection of a particular 

CC G protein-coupled receptor (GPCR) or a candidate polypeptide in a sample; 

CC and (2) an isolated antibody having high specificity and high affinity 

CC or avidity for a particular GPCR. (I) can be used as GPCR modulators and 

CC in gene therapy. The antigenic peptides for GPCRs are useful in detecting 

CC an antibody against a particular GPCR, and in the production of specific 

CC antibodies . The peptides and antibodies are also useful for detecting the 

CC presence or absence of corresponding GPCRs. The antigenic peptides for 

CC GPCRs and antibodies are useful for diagnosing and designing drugs for 

CC treating immune- related diseases, growth-related diseases, cell 

CC regeneration-related disease, immunological-related cell proliferative 

CC diseases, or autoimmune diseases, e.g. AIDS, Alzheimer's disease, 

CC atherosclerosis, bacterial, fungal, protozoan or viral infections, 

CC osteoarthritis, osteoporosis, cancer, cardiomyopathy, chronic and acute 

CC inflammation, allergies, Crohn 1 s disease, diabetes, graft versus host 

CC disease, Parkinson's disease, multiple sclerosis, pain, psoriasis, 

CC anxiety, depression, schizophrenia, dementia, mental retardation, memory 

CC loss, epilepsy, asthma, tuberculosis, obesity, nausea, hypertension, 

CC hypotension, renal disorders, rheumatoid arthritis, trauma, ulcers, or 

CC any other disorder in which GPCRs are involved. The antibodies may be 

CC used in immunoassays and immunodiagnosis . ABZ42523 to ABZ42869 encode 

CC GPCR proteins given in ABP81675 to ABP82018, which are used in the 

CC exemplification of the present invention. 

XX 

SQ Sequence 1380 BP; 383 A; 294 C; 274 G; 429 T; 0 other; 

Query Match 38.4%; Score 592.4; DB 25; Length 1380; 

Best Local Similarity 75.3%; Pred. No. 3.7e-140; 

Matches 764; Conservative 0; Mismatches 24 6; Indels 4; Gaps 2; 

Qy 39 GCAGAAT GGC ACAGAAT TT AT CT T GT GAGAAT T G GT T GGCAAC AGAGGCT AT CTT GAAT A 98 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 50 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 109 

Qy 99 AGT ACT AC CT CT CT G C ATTT T AT G CAAT C GAGT T CAT T T T T G GACT GCT T GGGAAT GT CA 15 8 

I I 1 I I I i I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 110 AGT AC T AC CT TT C C ATT T TT T AT G GGAT T GAGT T C GT T GT GG GAGT CCT T GGAAAT AC CA 169 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml I I I I I I I 

Db 170 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCT 22 9 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 230 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 28 9 



Qy 27 9 AT GCCAAT GAT AAGGG GAC CT AT GGAGAT GT T CT CT GT ATAAGCAAC C GAT AT GT GCT T C 33 8 

I I I I I I I I I II III I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 290 AT GCCAAT GGAAACT GGAT ATAT GGAGACGT GCT CT GCATAAGCAACCGAT AT GT GCTT C 34 9 

Qy 339 ACAC CAAC CT CT AC AC C AGCAT CCTCTTCCT C ACT T T CATT AG CAT GGAC C GAT AT CT GC 398 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I II 
Db 350 AT GCCAAC CT CT AT AC C AGCATT CTCTTTCT C AC T T T TAT CAG CAT AGAT C GAT ACTT GA 4 09 

Qy 399 T CATGAAGT ACC CTT T C CGAGAACACT T T CT ACAAAAGAAGGAAT TT GC CATT TT AAT CT 458 

I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I M 

Db 410 TAATTAAGTATCCTTTCCGAGAACACCTTCTGCAAAAGAAAGAGTTTGCTATTTTAATCT 4 69 

Qy 4 59 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I I I I I I I I I M I I I I II I I I I I I I I I I I I I I 

Db 470 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 52 9 

Qy 519 CT GT C CCAAAAGAAGAGGGCAGTAACT GCAT CGACTAT GCAAGTT CT GGAAACCCT GAAC 57 8 

I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 530 CTGTTATAACTGACAATGGCACCACCTGTAATGATTTTGCAAGTTCTGGAGACCCCAACT 58 9 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 63 8 

I I I I I I I I I I I I I M I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 590 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 64 9 

Qy 639 T GT GCT T CT T CT ACT ACAAGAT G GT AGT CT T CT TAAAGAGGAG GAGCCAGC AGCAAGCAA 698 

I I I I I I II I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 650 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 709 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I Mill II M I I I I I I I I I I II I I I I II M I I I I I I I I I 

Db 710 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 7 69 

Qy 759 TACT CTT CAC AC C CT AT CAT AT C ATGC G CAATT T GAGGAT C GC C T C AC GC CT GGAT AGT T 818 

I II II I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I II I I I I I I I I I 
Db 770 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 829 

Qy 819 G GCCACAAGGAT GTACACAGAAGGCCAT CAAAT CT AT AT ACACACT GACACGGCCT C 87 5 

I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

Db 830 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 889 

Qy 876 TGGCCTTTCT GAAC AGT GC CAT CAAT C C CAT CT T CT ACT T C CT CAT GGGAGAC CAT T AC A 935 

I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I II II I I I I I I I II I II 
Db 890 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 94 9 

Qy 936 GAGAGAT GCT GATT AGTAAGTT CAGACAATACTT CAAGT CCCTT ACAT C CTT CAGGACAT 995 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 950 GGGACAT GCT GAT GAAT CAACT GAGACACAACTT CAAAT CCCTT ACAT C CTTTAGCAGAT 1009 

Qy 996 GAG CT GC T GGAT GC AG GT CT T C ACT C AGC CAAAA- T GAGAC ACT T GAT AAAC AG 104 8 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 

Db 1010 GG GCT CAT GAACT CCT ACT T T CAT T C AGAGAAAAGT G AGGGGCT T GT GAAACAG 1063 



RESULT 5 
ABL90790 

ID ABL90790 standard; cDNA; 1436 BP. 



XX 

AC ABL90790; 
XX 

DT 24-MAY-2002 (first entry) 
XX 

DE Human polynucleotide SEQ ID NO 1352. 
XX 

KW Cytostatic; immunosuppressive; nootropic; neuroprotective; antiviral; 

KW antiallergic; hepatotropic; antidiabetic; antiinflammatory; antiulcer; 

KW vulnerary; anticonvulsant; antibacterial; antifungal; antiparasitic; 

KW cardiant; gene therapy; cancer; immune disorder; cardiovascular disorder; 

KW neurological disease; infection; human; secreted protein; gene; ss. 

XX 

OS Homo sapiens. 
XX 

PN WO200190304-A2. 
XX 

PD 2 9-NOV-2 001. 
XX 

PF 18-MAY-2001; 2001WO-US16450 . 
XX 

PR 19-MAY-2000; 2000US-205515P . 
XX 

PA (HUMA-) HUMAN GENOME SCI INC. 
XX 

PI Birse CE, Rosen CA; 
XX 

DR WPI; 2002-122018/16. 

DR P-PSDB; ABB90381. 
XX 

PT Novel 14 05 isolated polypeptides, useful for diagnosis, treatment and 

PT prevention of neural, immune system, muscular, reproductive, 

PT gastrointestinal, pulmonary, cardiovascular, renal and proliferative 

PT disorders - 

XX 

PS Claim 4; SEQ ID NO 1352; 2081pp + Sequence Listing; English. 
XX 

CC The invention relates to novel genes (ABL89449-ABL90853) and proteins 

CC (ABB89040-ABB90444 ) useful for preventing, treating or ameliorating 

CC medical conditions e.g. by protein or gene therapy. The genes are 

CC isolated from a range of human tissues disclosed in the specification. 

CC The nucleic acids, proteins, antibodies and (ant ) agonists are useful 

CC in the diagnosis, treatment and prevention of: (a) cancer, e.g. breast 

CC and ovarian cancer and other cancers of the adrenal gland, bone, bone 

CC marrow, breast, gastrointestinal tract, liver, lung, or urogenital; 

CC (b) immune disorders e.g. Addison 1 s disease, allergies, autoimmune 

CC haemolytic anaemia, autoimmune thyroiditis, diabetes mellitus, Crohn's 

CC disease, multiple sclerosis, rheumatoid arthritis and ulcerative 

CC colitis; (c) cardiovascular disorders such as myocardial ischaemias; 

CC (d) wound healing; (e) neurological diseases e.g. cerebral anoxia and 

CC epilepsy; and (f) infectious diseases such as viral, bacterial, fungal 

CC and parasitic infections. 

CC Note: The sequence data for this patent did not form part of the 

CC printed specification, but was obtained in electronic format directly 

CC from WIPO at ftp.wipo.int/pub/published_pct_sequences. 

XX 

SQ Sequence 1436 BP; 397 A; 309 C; 289 G; 441 T; 0 other; 



Query Match 38.4%; Score 592.4; DB 24; Length 1436; 

Best Local Similarity 75.3%; Pred. No. 3.7e-140; 

Matches 764; Conservative 0; Mismatches 246; Indels 4; Gaps 



2; 



Qy 39 GCAGAAT GGCAC AGAAT T TAT CT T GT GAGAAT T GGT T GGCAACAGAGGCTAT CT T GAAT A 98 

II MINI I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 100 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 159 

Qy 99 AGT ACT AC C T CT CT GCAT T TT AT GCAATC GAGT T CAT T TT T G GACT GCT T GGGAAT GT C A 158 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

Db 160 AGT ACT AC CT T T C CATTT T T TAT GGGATT GAGTT C GT T GT GGGAGT C C T T GGAAAT AC C A 219 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

Ml II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I 

Db 220 TTGTTGTTTACGGCTACATCTTCTCTCTGAAG7\ACTGGJ\ACAGCAGTAATATTTATCTCT 27 9 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 278 

I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2 80 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 339 

Qy 279 AT GCCAAT GATAAGGGGAC CT AT GGAGAT GT T CT CT GT AT AAG CAAC C GAT AT GT G CT T C 338 

MINIMI II Ml I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 340 AT GCCAAT GGAAAC T GGAT AT AT GGAGAC GT GCT CT G C AT AAGCAAC C GAT AT GT G CT T C 399 

Qy 339 ACACCAACCTCTACACCAGCATCCTCTTCCTCACTTTCATTAGCATGGACCGATATCTGC 398 

I I I I I I I I I I IM I II ! I I I I I I I I I I I I I I I I II I I I I I II I M II II 
Db 4 00 AT GCCAACCT CT ATACCAGCATT CT CTTT CT CACTTTT ATCAGCAT AGAT CGATACTT GA 459 

Qy 399 T CATGAAGT AC C CT T T CC GAGAAC ACTTT CT ACAAAAGAAGGAAT T T GC CATT T TAAT CT 458 

I II I I I I I II II II II II I II II I M I I II I I I I I II I I I I I I I II I I I I I I 
Db 4 60 TAATT AAGT AT C CTT T CC GAGAAC AC CTT CT G CAAAAGAAAGAGT T T GCT ATT T TAAT CT 519 

Qy 459 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I II I I II II II I I I I I I I I I II I I I I I I I I I 

Db 520 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 579 

Qy 519 CT GT CC CAAAAGAAGAGGGCAGT AACT GCAT C GAC TAT GCAAGT T CT G GAAAC C CT GAAC 578 

I I I I II II I I I I I INI! I I I I I I I I I I I I I I I I I I II I 
Db 58 0 CT GTT AT AACT GAC AAT GGCAC CAC CT GT AAT GAT T T T GCAAGT T CT GGAGACC C CAAC T 639 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I II I I I I I I I I II II II I I I I II I I I I I I I I I I I I I I I I II I I 
Db 64 0 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 699 

Qy 639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

I I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I III 

Db 700 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 759 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 7 60 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 819 

Qy 759 T ACT CT T CAC AC C CT AT CAT AT CAT GC G CAAT T T GAGGAT CGC CT CAC GC CT GGAT AGT T 818 

I II II I I I I I II I I I I I II I I I I III I I I I I I I I I I II I I I I I I I I IN 
Db 82 0 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 87 9 



Qy 819 G GC CACAAGGAT GT AC ACAGAAGGC C AT CAAAT C T AT AT ACAC ACT GACAC GGC CT C 875 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 88 0 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 939 

Qy 87 6 TGGCCTTTCT GAACAGT GC C AT CAAT C C CAT CT T CT ACTT C C T CATGGGAGACC AT TAC A 935 

I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I II II I I I I I I I II I II 

Db 94 0 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 999 

Qy 936 GAGAGAT GCT GAT T AGT AAGTT CAGACAATAC T T CAAGT C C CTTAC AT C CT T CAGGACAT 995 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1000 GGGACAT GCT GAT GAAT CAACT GAGAC ACAACT T CAAAT C C CT TACAT C CT T T AGC AGAT 1059 

Qy 996 GAGCTGCT GGAT GCAGGTCTTCACTCAGC CAAAA- T GAGAC ACTT GATAAACAG 1048 

I I I I III I I I I I I I I I I I I I I I I I I I I I I II I I I 

Db 1060 GGG CT CAT GAACT C CT ACTT T CAT T C AGAGAAAAGT GAGGG GCTT GT GAAACAG 1113 



RESULT 6 




ACC46165 




ID 


ACC46165 standard; cDNA; 1473 BP. 


XX 






AC 


ACC4 6165; 




XX 






DT 


02-JUN-2003 


(first entry) 


XX 






DE 


Human dithp receptor-encoding cDNA. 


XX 






KW 


Human; dithp; 


diagnostic and therapeutic polynucleotide; diagnosis; 


KW 


cancer; cell 


proliferative disorder; autoimmune disorder; 


KW 


inflammatory 


disorder; infection; hormonal disorder; metabolic disorders- 


KW 


neurological 


disorder; gastrointestinal disorder; transport disorder; 


KW 


connective tissue disorder; drug screening; proteome analysis; 


KW 


gene therapy; 


antisense therapy; genotyping; transgenic animal; knock in; 


KW 


disease model; toxicological testing; transcript imaging; 


KW 


receptor; gene; ss. 


XX 






OS 


Homo sapiens. 




XX 






PN 


WO200297031-A2. 


XX 






PD 


05-DEC-2002 . 




XX 






PF 


27-MAR-2002; 


2002WO-US10056. 


XX 






PR 


28-MAR-2001; 


2001US-279619P. 


PR 


2 9-MAR-2 001; 


2001US-280067P. 


PR 


29-MAR-2001; 


2001US-280068P. 


PR 


16-MAY-2001; 


2001US-291280P. 


PR 


17-MAY-2001; 


2001US-291829P. 


PR 


17-MAY-2001; 


2001US-291849P. 


PR 


19-JUN-2001; 


2001US-299428P. 


PR 


20-JUN-2001; 


2001US-299776P. 


PR 


20-JUN-2001; 


2001US-300001P. 


XX 






PA 


(INCY-) INCYTE GENOMICS INC. 


XX 






PI 


Daffo A, Jones AL, Tran AB, Dahl CR, Gietzen D, Chinn J; 



PI Dufour GE, Hillman JL, Yu JY, Tuason 0, Yap PE, Amshey SR; 

PI Daughtery SC, Dam TC, Liu TF, Nguyen DA, Kleefeld Y, Gerstin EH; 

PI Peralta CH, David MH, Lewis SA, Chen A J, Panzer SR, Harris B; 

PI Flores V, Marwaha R, Lo A, Lan RY, Urashka ME; 

XX 

DR WPI; 2003-129518/12. 

DR P-PSDB; ABR41222. 
XX 

PT Novel human diagnostic and therapeutic polypeptide useful for 

PT identifying test compound which specifically binds to a polypeptide 

PT encoded by human diagnostic and therapeutic polynucleotide, and to 

PT induce antibodies 

XX 

PS Claim 2; SEQ ID No 86; 591pp; English. 
XX 

CC The invention relates to novel human diagnostic and therapeutic 

CC polynucleotides designated dithp (ACC46080-ACC46749 ) and to their 

CC encoded proteins (DITHP; ABR41136-ABR41812 ) . The invention also relates 

CC to polynucleotide sequences at least 90% identical to the dithp cDNA 

CC sequences of the invention; recombinant vectors, host cells and 

CC transgenic organisms comprising a dithp nucleic acid sequence; the 

CC recombinant production of DITHP proteins; antibodies specific for DITHP 

CC proteins; microarrays comprising dithp nucleic acid sequences; methods 

CC of detecting dithp nucleotide and protein sequences; methods of screening 

CC for compounds which specifically bind a DITHP protein; and methods of 

CC assessing the toxicity of test compounds using a dithp hybridisation 

CC probe. Dithp nucleic acid sequences and DITHP proteins may be used in the 

CC diagnosis of a wide variety of conditions including cancer and other cell 

CC proliferative disorders; autoimmune or inflammatory disorders; bacterial, 

CC viral, fungal or parasitic infections; hormonal disorders; metabolic 

CC disorders; neurological disorders; gastrointestinal disorders; transport 

CC disorders; and connective tissue disorders. They may also be used to 

CC screen for modulators of protein activity or gene expression. DITHP 

CC proteins can additionally be used in analysis of the proteome of a tissue 

CC or cell type and to induce antibodies. The dithp nucleic acids are 

CC additionally useful in somatic or germline gene therapy of the disorders 

CC mentioned above, as a source of antisense sequences, as a source of 

CC probes and primers, in genotyping and identification of individuals, in 

CC the generation of transgenic animal models of human disease or knock in 

CC humanised animals, in toxicological testing, and in transcript imaging. 

CC The present sequence represents a dithp cDNA encoding a DITHP protein 

CC which has receptor activity. 

CC Note: The sequence data for this patent did not form part of the printed 

CC specification, but was obtained in electronic format directly from WIPO 

CC at ftp.wipo.int/pub/published_pct_sequences. 
XX 

SQ Sequence 1473 BP; 403 A; 320 C; 303 G; 447 T; 0 other; 

Query Match 38.4%; Score 592.4; DB 25; Length 1473; 

Best Local Similarity 75.3%; Pred. No. 3.8e-140; 

Matches 764; Conservative 0; Mismatches 246; Indels 4; Gaps 2; 

Qy 39 GCAGAAT G GC AC AGAAT T TAT CTT GT GAGAAT T GGT T GGCAACAGAGGCT AT CT T GAAT A 98 

II I I I I I I I I I I I I I I I I I I I I E I I I II I I I I I M I i I I I I 
Db 119 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 17 8 



Qy 



99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 I 1 1 1 I 1 1 1 1 1 1 1 1 II 

179 AGT ACT ACCT T T C C AT TT T T TAT G GGAT T GAGT T C GTT GT GGGAGT CCT T GGAAAT AC C A 238 

159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I III I Mill I 
239 T T GT T GTTT AC G GCT ACAT CT T CT CT CT GAAGAACT GGAACAGCAGTAAT AT T T AT CT CT 2 98 

219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 
299 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 358 

279 AT GC CAAT GATAAGGGGAC C TAT GGAGAT GTT CT CT GT AT AAGCAACC GAT AT GT GCT T C 338 

I I II I I I I I II III MINIM II I I I I I I I I I I I I I I I I I I I I I I I I I II 
359 AT GC CAAT GGAAACT GGAT AT AT GGAGAC GT GCT CT GC AT AAGCAACC GAT AT GT GCT T C 418 

339 AC AC CAACCT CT AC AC CAGC AT CCTCTTCCT CACT TT CAT T AGC AT GGAC C GATATCT GC 398 

I I I II I I I I I I I I I I I I I I Mill II I II II I II II I I I II I I I II II 
419 AT GC CAACCT CT AT AC CAG C ATT CT CTT T CT C ACT TT T AT CAGCAT AGAT C GAT ACTT GA 47 8 

399 T CAT GAAGT AC C CT T T CC G AGAAC ACTT T CT ACAAAAGAAGGAATTT GC C AT T T TAAT CT 45 8 

I II I I II I M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I II 
479 TAAT TAAGT AT C CT T T CC GAGAACAC CT T CT G CAAAAGAAAGAGTTT G CT AT TT TAAT CT 53 8 

459 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I II I I I II I I I I I I I I I II I I I I I I II I I I I I I I I II I I I 

539 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 59 8 

519 C T GT C C CAAAAGAAGAGGGCAGT AACT GCAT C GACT AT G CAAGT T CT GGAAAC C CT GAAC 57 8 

I I I I II II Mill I I II I I I I II I I I I I I I I I I I I I II I 
599 CT GTT AT AACT GACAAT GG CAC CAC CTGTAAT GATTTT G CAAGT T CT GGAGAC CC CAACT 658 

579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I II I II I II I II M II I I I I I I I I I I I I I I I I II I I I I I I I I 
659 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 718 

639 T GT GCTTCTT CT ACTACAAGAT GGTAGT CTT CTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I III 

719 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 77 8 

699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 75 8 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I 

779 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 83 8 

759 TACT CTT C ACAC C CT AT CAT AT CAT GCGCAAT T T GAGGAT C GC CT CAC GC CT G GAT AGTT 818 

I II II I I M I I I II I I I I II I I I III I I I I I I I I I I I II I I I I I I I I I II 

839 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 898 

819 G GC CACAAG GAT GT AC AC AGAAGGC CAT CAAAT CT AT AT ACACACT GAC AC G GC CT C 875 

I II I I I I I I M I I II I II I I I I I I I I I I I I II I I I I 

899 GGAAGCAGT AT C AGT GCACT CAG GT C GT CAT CAACT CCT TT T AC ATT GT GACAC GGC CTT 958 

876 TGGCCTTTCT GAAC AGT G C CAT CAAT CCCAT CTT CTACT T CCT CAT GGGAGAC CATT AC A 935 

I II I I I I I I I I I I I I II I I I II I I II I I I I II I II II I I I I I I I II I II 
959 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 1018 

936 GAGAG AT GCT GAT TAG TAAGT T CAGACAAT ACTT CAAGT C C CT T ACAT C CTT C AGGAC AT 995 
I I I I I II I I I I I I I I I II II I I II II I I I I I I I I I I II I I I II I I I 



Db 



1019 GGGAC AT GC T GAT GAAT CAACT GAGAC ACAACT T CAAAT C C CT T AC AT C CT T T AGCAGAT 1078 



Qy 996 GAG C T GC T GGAT GC AGGT CT T CACT C AGC CAAAA- T GAGAC AC T T GAT AAAC AG 104 8 

I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 107 9 GG GCT CAT GAACT C CT ACT T T CATT C AGAGAAAAGT GAG GGGCT T GT GAAAC AG 1132 



RESULT 7 
AAD24958 

ID AAD24958 standard; cDNA; 1542 BP. 
XX 

AC AAD24958; 
XX 

DT 12-MAR-2002 (first entry) 
XX 

DE Human G-protein coupled receptor-3 (GCREC-3) cDNA. 
XX 

KW Human; G-protein coupled receptor-3; GCREC-3; therapy; cancer; stroke; 

KW cell proliferative disorder; neurological; epilepsy; Parkinson's disease; 

KW Alzheimer's disease; inflammation; thyroiditis; haemolytic anaemia; AIDS; 

KW Acquired Immune Deficiency Syndrome; dementia; nootropic; cholelithiasis; 

KW multiple sclerosis; atherosclerosis; angina pectoris; gastroenteritis; 

KW diabetes; ulcer; viral infection; immunosuppressive; ss. 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 63. . 1202 

FT /*tag= a 

FT /product^ "Human GCREC-3 protein" 

XX 

PN WO200198351-A2. 
XX 

PD 27-DEC-2001. 
XX 

PF 15-JUN-2001; 2 001WO-US19275 . 
XX 

PR 16-JUN-2000; 2 000US-2 124 83P . 

PR 22-JUN-2000; 2 000US-2 13954P . 

PR 29-JUN-2000; 2 000US-2 152 09P . 

PR 07-JUL-2000; 2000US-2 16595P . 

PR 14-JUL-2000; 2000US-2 18936P . 

PR 19-JUL-2000; 2000US-219154P . 

PR 21-JUL-2000; 2000US-22014 IP . 
XX 

PA (INCY-) INCYTE GENOMICS INC. 
XX 

PI Lai P, Baughn MR, Hafalia AJA, Nguyen DB, Gandhi AR, Kallick DA; 

PI Griffin JA, Yue H, Khan FA, Patterson C, Lu DAM, Tribouley CM; 

PI Lu Y, Walia NK, Graul R, Yao MG, Yang J, Ramkumar J, Au-Young J; 

PI Elliott VS, Hernandez R, Walsh RT, Borowsky ML, Thornton M, He A; 
XX 

DR WPI; 2002-075627/10. 

DR P-PSDB; AAE15633. 
XX 

PT Isolated human G-protein coupled receptor polypeptides and the use of 

PT these sequences in the diagnosis, treatment and prevention of diseases 



PT and in the assessment of exogenous compounds on the expression of the 

PT receptors - 

XX 

PS Claim 11; Page 133; 143pp; English. 
XX 

CC The invention relates to isolated human G-protein coupled receptor 

CC (GCREC) polypeptides and their biologically active fragments. GCREC and 

CC protein is useful in treating a disease or condition associated with an 

CC increase or decrease in expression of functional GCREC. The GCREC 1 s are 

CC useful in the diagnosis, treatment and prevention of cell proliferative 

CC disorders (cancer, leukaemia, melanoma) ; neurological disorders (stroke, 

CC epilepsy, Parkinson's disease, dementia, Alzheimer's disease); autoimmune 

CC inflammatory disorder (thyroiditis, haemolytic anaemia, AIDS, multiple 

CC sclerosis); cardiovascular disorder (atherosclerosis, angina pectoris), 

CC gastrointestinal disorder (ulcer, cholelithiasis, gastroenteritis), 

CC metabolic disorders (diabetes); viral infections (herpes virus) and in 

CC the assessment of the effects of exogenous compounds on the expression 

CC of the nucleic acid and amino acid sequences. The present sequence is 

CC human GCREC- 3 cDNA. 
XX 

SQ Sequence 1542 BP; 428 A; 327 C; 315 G; 472 T; 0 other; 



Query Match 38.4%; Score 592.4; DB 24; Length 1542; 

Best Local Similarity 75.3%; Pred. No. 3.8e-140; 

Matches 7 64; Conservative 0; Mismatches 24 6; Indels 4; Gaps 2; 

Qy 39 GC AGAAT GG C AC AGAAT T T AT CT T GT GAGAAT T GGTT GGCAAC AGAG GCT AT CT T GAATA 98 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 205 GGAT CAT GGCAT GGAAT GCAACTT GCAAAAACT GGCT GGCAGCAGAGGCT GCCCTGGAAA 2 64 



Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 265 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 324 



Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I III I I I I I I I 
Db 325 T T GT T GTT T AC G GCTACAT CT T CT CT CT GAAGAACT GGAAC AGCAGT AAT AT T TAT CT CT 384 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 278 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 385 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 444 



Qy 27 9 AT GC CAAT .GATAAG GGGAC C TAT GGAGAT GT T CT CT GT AT AAGCAAC C GATAT GT GCT T C 338 

MINIMI II III I I I I I I I I II I I I I I I I I I M I I I I I I I I I I I I I I I I 
Db 44 5 AT GC CAAT G GAAACT GGAT AT AT GGAGAC GT GCT CT GC ATAAG CAAC C GATAT GTGCT T C 504 



Qy 33 9 AC AC CAAC C T C T ACACC AG CAT CCTCTTCCT C ACTT T CAT TAG CAT GGAC C GAT AT CT GC 3 98 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I 
Db 505 AT G C CAAC C T CT AT ACCAG CAT TCTCTTTCT C ACTT T TAT CAGC AT AGAT C GAT ACT T GA 5 64 



Qy 39 9 T CAT GAAGT AC C CT TTC C GAGAACACTT T CT ACAAAAGAAG GAATT T GC C ATTTTAAT CT 4 58 

I II I I I I I II I I I I I II I I I I I I I I I I I II M I It II I I I I I I I I I II I I I I 
Db 565 T AAT T AAGT AT C CT TTC C GAGAACAC CT T CT GCAAAAGAAAGAGTT T GC T ATTTTAAT CT 624 



Qy 45 9 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I 

Db 62 5 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 684 



Qy 519 CT GT CC CAAAAGAAGAG GGC AGTAAC T GCAT C GACT AT GCAAGT T CT GGAAAC C CT GAAC 57 8 

I I I I II II I I I I I I III I II I I I I I I I I I 1 I I I I I I I I I 
Db 685 CT GT T AT AACT GACAAT GGC AC CACCT GT AAT GATT T T GCAAGT TCT GGAGAC C C CAACT 7 44 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I I I I I I I II II II I I I I II I I I I I I I I I I I I I I I I M I I 

Db 745 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 804 

Qy 639 T GT GCT T CT TCT ACT ACAAGAT GGT AGT CTT CT T AAAGAGGAG GAGC CAG CAGCAAGCAA 698 

M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 8 05 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 8 64 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 

Db 8 65 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 924 

Qy 759 T ACT CT T C ACAC C CT AT CAT AT C ATGC GCAAT T T GAGGAT C GC CT C AC GCCT GGAT AGT T 818 

I II II I I I I I I I I I I I I I I I I I I III I I I I I II II I I I I I I I I I I I I I I I 
Db 925 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 984 

Qy 819 G GCCACAAGGATGTACACAGAAGGCCAT CAAAT CTAT ATACACACT GACAC GGCCT C 875 

I. II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 985 GGAAGC AGT AT C AGT GC ACT CAGGT C GT CAT CAACT C CT TT T AC ATT GT GAC AC GG C CTT 1044 

Qy 87 6 TGGCCTTT CTGAAC AGT GC C AT CAAT C C CAT CT T CT AC T T C CT C AT GGGAGAC CAT T ACA 935 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I 
Db 1045 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 1104 

Qy 936 GAGAGAT GCTGATTAGTAAGTT CAGACAATACTTCAAGT CCCTT ACAT CCTTCAGGACAT 995 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I III 

Db 1105 GGGACAT GCTGAT GAAT CAACT GAGACACAACTT CAAAT CC CTT ACAT CCTTTAGCAGAT 1164 

Qy 9 96 GAGCTGCT GGAT GC AGGT CTT C ACT C AGC CAAAA- T GAGACACT T GATAAAC AG 104 8 

I I II III I I I I I I I I I I I I I I I I I I I I I INN! 

Db 1165 GGGCT C AT GAAC T C CTACT T T CAT T C AGAGAAAAGT GAG GG GCT T GT GAAAC AG 1218 



RESULT 8 
ABS57291 

ID ABS57291 standard; cDNA; 1338 BP. 
XX 

AC ABS57291; 
XX 

DT 30-JAN-2003 (first entry) 
XX 

DE cDNA encoding human adenosine receptor. 
XX 

KW Human; mammalian; adenosine receptor; G-protein coupled receptor; 

KW GPCR; adenosine-mediated medical condition; vasodilation; hypotension; 

KW reversal of tachycardia; chronic renal disease; thyroid disorder; 

KW inflammation; asthma; hypertensive; antiarrhythmic; antiinflammatory; 

KW antiasthmatic; gene; ss. 

XX 

OS Homo sapiens. 
XX 

FH Key Location/Quali f iers 



FT CDS 1..1005 

FT /*tag= a 

FT /product= "Adenosine receptor" 
XX 

PN US2002137887-A1. 
XX 

PD 26-SEP-2002. 
XX 

PF 17-JAN-2001; 2001US-0765034 . 
XX 

PR 17-JAN-2001; 2 001US-0765034 . 
XX 

PA (HEDR/) HEDRICK J A. 

PA (LACH/) LACHOWICZ J E. 

PA (WANG/) WANG W. 

PA (GUST/) GUSTAFSON E L. 

XX 

PI Hedrick JA, Lachowicz JE, Wang W, Gustafson EL; 
XX 

DR WPI; 2003-074992/07. 

DR P-PSDB; ABG72131. 
XX 

PT Novel isolated mammalian adenosine receptor polypeptide useful for 

PT identifying an agonist or antagonist of the receptor for treating 

PT vasodilation, hypotension, chronic renal diseases, thyroid disorders 

PT and inflammation - 
XX 

PS Example 1; Page 14-16; 19pp; English. 
XX 

CC The present invention relates to the isolation of a mammalian 

CC (human) adenosine receptor, and the polynucleotide sequence 

CC encoding it. The cloned receptor resembles a member of the 

CC G-protein coupled receptor (GPCR) superfamily that contains 

CC 7-transmembrane domains. The adenosine receptor is useful for 

CC identifying agonists and antagonists of the receptor, which may be 

CC useful for treating an adenosine-mediated medical condition. The 

CC adenosine receptor polypeptide sequence is also useful as an 

CC antigen to elicit antibody production in an immunologically 

CC competent host. An antibody which binds specifically to the 

CC adenosine receptor is useful for treating medical conditions caused 

CC or mediated by adenosine such as vasodilation, hypotension, reversal 

CC of tachycardia, chronic renal diseases, thyroid disorders and 

CC inflammation (e.g. asthma). The antibody can also be used to purify 

CC the adenosine receptor, or as a basis for immunoassays of the receptor. 

CC The polynucleotide sequence encoding the adenosine receptor is useful 

CC for producing vectors and host cells containing the vectors. It is 

CC also useful for measuring expression of a mammalian adenosine 

CC receptor gene in a biological sample. The present sequence encodes 

CC human adenosine receptor. 

XX 

SQ Sequence 1338 BP; 370 A; 288 C; 265 G; 415 T; 0 other; 

Query Match 38.3%; Score 590.8; DB 25; Length 1338; 
Best Local Similarity 75.2%; Pred. No. 9.2e-140; 

Matches 763; Conservative 0; Mismatches 247; Indels 4; Gaps 2; 

Qy 39 GCAGAATGGCACAGAATTTATCTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATA 98 



Db 



II I I I I I I I I I I I I I I I I I I I I I I I I II I I M I I I I I I I I I 

8 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 67 



Qy 99 AGT ACTAC CT CT CT GC AT T TT AT GCAAT C GAGT T C ATT T T T GGACT GCT T GGGAAT GT C A 158 

I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II 

Db 68 AGT ACTAC CT T T C CAT T T T TT AT GGGAT T GAGT T C GTT GT GGGAGT C CT T G GAAAT AC C A 127 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I E I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I 

Db 128 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCT 187 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 278 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 

Db 188 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 247 

Qy 279 AT GCCAATGATAAGGGGACCTAT GGAGAT GTT CT CT GTATAAGCAACCGATATGT GCTTC 338 

I I I I I I I I I II III I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 248 AT GC CAAT GGAAACT G GAT AT AT G GAGAC GT GCT CT GC ATAAG CAAC CGAT AT GT GCTTC 307 

Qy 339 AC AC CAAC CT CT ACAC CAGCAT C CT CT T C CT C ACT T TC AT TAG CAT GGAC C GAT AT CT GC 398 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II 
Db 308 AT G C CAAC CT CT ATAC CAGCAT T CT CT T T CT CACT T TT AT C AG C AT AGAT C GAT ACTT GA 367 

Qy 399 T CAT GAAGT AC C CTT T C C GAGAAC ACT T T CT ACAAAAGAAGGAATT T GC CAT T T T AAT CT 458 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 368 T AAT T AAGT AT C CTTT C C GAGAAC AC CT T CT G CAAAAGAAAGAGT T T GCT AT T T T AAT CT 427 

Qy 4 59 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I II Mill II I II II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 428 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 4 87 

Qy 519 CT GT C C CAAAAGAAGAGGGC AGT AACT G CAT C GACT AT GCAAGT T CT GGAAAC CCT GAAC 57 8 

I I I I II II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 488 CT GT TAT AACT GACAAT GGCAC CAC CT GT AAT GAT T TT GCAAGT T CT GGAGAC CC CAACT 547 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 548 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 607 

Qy 639 T GTGCTT CTT CTACTACAAGAT GGTAGT CTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I III 

Db 608 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 667 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 

Db 668 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 727 

Qy 759 T ACT CT T CAC AC C CT AT CAT AT CAT GC GCAAT T T GAGGAT C GC CT CAC GC CT G GATAGTT 818 

I I II I I I II I I I I I I I I I I I I I I II I I I I I I I I I i I I I I I I I I I I I I I I 

Db 728 TGCCTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 787 

Qy 819 G G C CACAAG GAT GT AC AC AGAAGG C CAT CAAAT CT AT AT ACAC ACT GAC AC GG C CT C 875 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 7 88 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGCCTT 847 

Qy 876 TGGCCTTT CT GAAC AGT G C CAT CAAT C C CAT CTT CT ACTT C CT C AT GGGAGAC CAT T AC A 935 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II II I I I I I I I II I II 



Db 



84 8 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 907 



Qy 936 GAGAGAT G CT GAT T AGT AAGT T CAGACAAT ACT T CAAGT C CCT TAG AT C CTT CAGGACAT 995 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 1 M 
Db 9 08 GGGACAT G CT GAT GAAT CAAC T GAGAC ACAACT TCAAAT C C CT T AC AT C CTTT AG C AGAT 967 

Qy 9 96 GAGCT GCT GGAT G CAGGT CT T CACT CAGC CAAAA- T GAGACACT T GATAAACAG 1048 

I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 968 GGGCT CAT GAAC T C CT AC T T T CAT T C AGAGAAAAGT GAGGGGC TT GT GAAACAG 1021 



RESULT 9 
AAT71900 
ID 
XX 



AC 
XX 
DT 
XX 
DE 
XX 
KW 
XX 
OS 
XX 
FH 
FT 
FT 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
XX 
PA 
XX 
PI 
XX 
DR 
DR 
XX 
PT 
PT 
PT 
XX 
PS 
XX 
CC 
CC 
CC 
CC 
CC 
CC 



AAT71900 standard; cDNA; 1996 BP. 
AAT71900; 

ll-SEP-1997 (first entry) 

Human purinergic receptor P2U2 cDNA. 

P2U2 receptor; purinergic receptor; diagnosis; therapy; ss, 



Homo sapiens, 

Key 
CDS 



WO9720045-A2, 

05-JUN-1997. 

08-NOV-1996; 

15-NOV-1995; 
15-NOV-1995; 



Location/Qualifiers 
625. . 1629 
/*tag= a 



96WO-US18175. 

95US-0559524. 
95US-0006782. 



(CORT-) COR THERAPEUTICS INC. 

Conley PB, Jantzen H; 

WPI; 1997-310601/28. 
P-PSDB; AAW19854. 

New isolated purinergic receptor sub-type - used to develop 
products for diagnosis and therapy, e.g. for screening for agonists 
and antagonists which can modulate activation 

Claim 3; Fig 1A-C; 36pp; English.. 

A cDNA clone (AAT71900) codes for a novel human purinergic receptor 
subtype, designated P2U2 receptor (AAW19854), that is abundantly 
expressed in kidney and in many cell lines of megakaryocyte or 
erythroleukaemic origin and which is activated by ATP, UDP, UTP and 
UDP. The clone was obtd. by amplifying DAMI (ATCC CRL 9792) cell 
cDNA using primers (see also AAT72104-05) based on transmembrane 



CC regions of mouse P2u and chicken P2Y1 receptors, and use of the PGR 

CC product to screen the DAM I cDNA library to isolate the full-length 

CC clone. P2U2 nucleic acids can be used in the recombinant prodn. of 

CC P2U2 receptor polypeptides and as probes. 
XX 

SQ Sequence 1996 BP; 513 A; 454 C; 381 G; 647 T; 1 other; 

Query Match 38.2%; Score 589.2; DB 18; Length 1996; 

Best Local Similarity 75.1%; Pred. No. 2.8e-139; 

Matches 762; Conservative 0; Mismatches 24 8; Indels 4; Gaps 2; 

Qy 39 GCAGAAT G G CACAGAAT T TAT CT T GT GAGAAT T G GTT G GCAACAGAGGCT AT CTT GAAT A 98 

II I I I I I I I I I I I I I I I I I I. I I I I I I I I I M I I I I I I I I I I 
Db 632 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 691 

Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 15 8 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 692 AGT ACT AC CTTT CCAT T T T T T AT GG GATT GAGT T C GTT GT GG GAGT C CTT GGAAAT AC C A 751 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III IE I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I III I I I I M I 
Db 752 T T GT T GT T T AC GGCT ACAT CTT CT C T CT GAAGAACT GGAAC AGC AGTAAT AT TT AT C T CT 811 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I II I I I I II I I! I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I 
Db 812 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 871 

Qy 279 AT GCCAAT GATAAGGGGACCT AT GGAGAT GTT CT CT GTATAAGCAACCGAT AT GT GCTTC 33 8 

I I I I I I I I I II III I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 872 AT GCCAAT GGAAACT GGATAT AT GGAGACGT GCT CT GCATAAGCAACCGATAT GT GCTTC 931 

Qy 339 AC AC CAAC CT CT ACAC C AGC AT C CT CTT C CT CACTT T C ATT AGCAT GGAC C GAT AT CT GC 39 8 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I II I i II 
Db 932 AT GC CAAC CT CT AT AC C AGC ATT CT CTT T CT CACTT TT AT CAGCAT AGAT C GATACTT GA 991 

Qy 399 T CAT GAAGT ACCCT T T C C GAGAACACTT T CT ACAAAAGAAG GAAT T T GC CAT T TTAAT CT 458 

I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 992 TAAT TAAGT AT CCT T T C C GAGAACACCT T CT GCAAAAGAAAGAGTT T GCT AT T T TAAT CT 1051 

Qy 459 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 1052 C CT T G GC C ATTT GGGT T T T AGTAAC CTT AGAGT T AC T AC C CAT ACT T C C C CT T ATAAAT C 1111 

Qy 519 C T GT C C CAAAAGAAGAGGGCAGTAACT GCAT C GACT AT GCAAGT T CT GGAAAC CCT GAAC 578 

I I I I II II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 1112 CT GT T ATAACT GAC AAT GGC AC CAC CT GTAAT GAT T T T GCAAGT T CT GGAGAC C C CAACT 1171 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I I I I I I I II II II I I I I M I I I I I I I I I I I I I I I I I I I I 
Db 1172 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 1231 

Qy 639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I III 

Db 1232 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 12 91 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I IE I I I I I II I I I I II I I I I I I 1 I I I I I I I I I I I I I II I I I 



Db 



1292 CTGCTCTGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 1351 



Qy 759 T ACT CTT CACAC C C TAT CAT AT CAT GC GCAAT T T GAGGAT C GC CT CAC G C CT GGATAGT T 818 

I II II I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1352 T GCTT TTT ACAC C C TAT CAC GT CAT GC GGAAT GT GAGGAT C G CT T CAC GC CT GGGGAGT T 1411 

Qy 819 G GC C ACAAGGAT GT ACACAGAAGGC C AT CAAAT CT AT AT AC ACACT GACAC GGC CT C 875 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1412 GGAAGCAGTATCAGTGCACTCAGGTCGTCATCAACTCCTTTTACATTGTGACACGGGCTT 1471 

Qy 876 TGGCCTTTCT GAAC AGT GC C AT CAAT C C CAT CTT CT ACTT C CT CAT GGGAGAC CATT ACA 935 

III I I I I I I I I I I I I M I I I I II II I I I I I I I II II I I I I I I I II I II 
Db 1472 TGGGCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 1531 

Qy 936 GAGAGAT GCT GAT T AGTAAGT T C AGACAAT ACT T CAAGT C C CT T AC AT C CT T CAGGAC AT 995 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1532 GGGACAT GCT GAT GAAT CAACT GAGAC ACAACT T CAAAT C C CTT AC AT C CT T T AGCAGAT 1591 

Qy 996 GAGCT GCT GGAT GCAGGT CTT C ACT C AG C CAAAA- T GAGACACT T GAT AAACAG 104 8 

I I I I III I I I II I I I I I I I I I I I I M I I I I I I I I 

Db 1592 GGGCTCATGAACTCCTACTTTCATTCAGAGAAAAGTGAGGGGCTTGTGAAACAG 1645 



RESULT 10 
AAT75146 



ID AAT75146 standard; cDNA; 1428 BP. 
XX 

AC AAT75146; 
XX 

DT 07-OCT-1997 (first entry) 
XX 

DE Human ATP receptor cDNA. 
XX 

KW ATP receptor; G-protein coupled receptor; agonist; antagonist; ss. 
XX 

OS Homo sapiens . 
XX 

FH Key Location/Qualif iers 

FT CDS 92.. 1096 

FT /*tag= a 

FT /transl_except= (pos : 725 . . 727 , aarSer) 

FT /transl_except= (pos : 764 . . 766, aa:Ser) 

FT /transl_except= (pos : 820 . . 822 , Xaa) 

FT /note= "Xaa = unknown" 

FT primer_bind complement (92.. 100) 

FT /*tag= b 

FT /note= "binding site for primer used to amplify 

FT cDNA for baculovirus expression" 

FT primer_bind complement (92.. 109) 

FT /*tag= c 

FT /note= "binding site for primers used to amplify 

FT cDNA for bacterial or COS expression" 

FT primerjDind 1076.. 1095 

FT /*tag= d 

FT /note= "binding site for primer used to amplify 

FT cDNA for COS expression" 

FT primer_bind 1079.. 1096 



FT /*tag= e 

FT /note= "binding site for primer used to amplify 

FT cDNA for bacterial expression" 

FT primer_bind 1085.. 1096 

FT /*tag= f 

FT /note= "binding site for primer used to amplify 

FT cDNA for baculovirus expression" 

XX 

PN W09724929-A1 . 
XX 

PD 17-JUL-1997. 
XX 

PF ll-JAN-1996; 96WO-US00392 . 
XX 

PR ll-JAN-1996; 96WO-US00392. 



XX 

PA (HUMA-) HUMAN GENOME SCI INC. 
XX 

PI Li Y; 
XX 

DR WPI; 1997-372505/34. 

DR P-PSDB; AAW22732. 
XX 

PT Isolated human ATP receptor - agonists and antagonists of which are 

PT useful in treatment of, e.g. asthma, hypertension, arterial 

PT thrombosis and psychotic and neurological disorders 
XX 

PS Claim 7; Fig 1A-C; 53pp; English. 
XX 

CC A cDNA clone (AAT75146) codes for human ATP receptor (AAW22732), a 

CC polypeptide structurally related to the G protein-coupled receptor 

CC family. It was discovered in a human placenta cDNA library. 

CC cDNA encoding the mature receptor, deposited as ATCC 97333, can 

CC be expressed in bacterial (e.g. E. coli), mammalian (e.g. COS) or 

CC insect (e.g. Sf9) host cells and used to screen for agonists and 

CC antagonists useful in the treatment of a variety of disorders. 

CC It can also be used to identify a mutation in an ATP receptor gene 

CC and thus to diagnose diseases, or susceptibility to diseases, 

CC related to ATP receptor underexpression . 

XX 

SQ Sequence 1428 BP; 394 A; 308 C; 290 G; 435 T; 1 other; 

Query Match 38.1%; Score 587.2; DB 18; Length 1428; 

Best Local Similarity 75.0%; Pred. No. 7.8e-139; 

Matches 760; Conservative 1; Mismatches 249; Indels 4; Gaps 2; 

Qy 39 GC AGAAT GGC AC AGAAT TT AT CTT GT GAGAAT TGGT T GGCAACAGAGGCT AT CTT GAAT A 98 

II MINI I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill 
Db 99 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 158 

Qy 99 AGTACTACCTCTCTGCATTTTATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCA 158 

I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 159 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 218 

Qy 159 
Db 219 



CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I 

TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGTAATATTTATCTCT 27 8 



Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 279 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 338 

Qy 279 AT GC CAAT GAT AAG GG GAC CT AT GGAGAT GT T C T C T GT AT AAGCAACC GAT AT GT GCT T C 338 

I I I I I I I I I II I I I MINIM II I I I I I I I I I I II I I I II I I I I I I I I I I 
Db 339 AT GC CAAT GGAAACT GGAT AT AT GGAGAC GT G C T CT GC ATAAG CAAC C GAT AT GT GCT T C 398 

Qy 339 ACAC CAACCT CT AC AC CAGCAT CCTCTTCCT C ACT T TC AT TAG CAT GGAC C GAT AT CT GC 398 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I II 
Db 399 AT GC CAACCT C TAT AC CAGCAT TCTCTTTCT C ACT TTT AT CAGCAT AGAT C GAT ACT T GA 458 

Qy 399 T CAT GAAGT AC CCTTT CCGAGAACACTTT CT ACAAAAGAAGGAATTTGCCATTTTAAT CT 458 

I II I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I II I III I I I I I M I I I 
Db 4 59 TAAT T AAGT AT CCTTT CC GAGAACAC CT T CT GCAAAAGAAAGAGT GT GCT AT T TT AAT CT 518 

Qy 459 CGCTGGCTGTCTGGGCCTTAGTGACCTTAGAAGTTCTACCCATGCTCACTTTCATCAATT 518 

I I I I I I I I I I I II I I I I I I I I M I I I I I I I I I I I I I II Ml 

Db 519 CCTTGGCCATGTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 57 8 

Qy 519 CT GT C C CAAAAGAAGAGGG C AGT AACT GCAT C GAC TAT GCAAGT T CT GGAAAC C CT GAAC 57 8 

I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 579 CT GTT ATAACT GACAATGGCACCACCTGTAAT GATTTT GCAAGTT CTGGAGAC CC CAACT 638 

Qy 579 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 

I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 639 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 698 

Qy 639 T GT GCT T CT T CT ACTACAAGAT GGT AGT CT T C T TAAAGAGGAG GAGCC AG C AG CAAG CAA 698 

II I I I I I I I I I I I I II I I I I I I M I I I I I I I I I I I I I I III 

Db 699 TGTGTTTCTTTTATTACAAGATTGCCTCCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 758 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I I I I I I I I I I I I I I Mill I I I I I I I I I I I I I I I I I I I 

Db 759 CTGCCTCGCCCCTTGAAAAGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 818 

Qy 759 TAC T CTT C ACAC C CT AT CAT AT CAT GCGCAAT T T GAGGAT C GC CT C AC GC CT G GAT AGT T 818 

I I : II I I I I I I I I I I I I I I I I I I III I I I I I I II I I I I I I II I I I I I I I I 

Db 819 TGCYTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 878 

Qy 819 G GCCACAAGGAT GTACACAGAAGGCCATCAAAT CT ATATACACACT GACACGGCCT C 87 5 

I II I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I 

Db 879 GGAAGCAGT AT C AGT GCACT CAG GT C GT CAT CAACT C CT T T TAC ATTGT GACAC GGC CT G 938 

Qy 876 T GGC C T TT CT GAAC AGT GC CAT CAAT CCCAT CT T C T ACT T CCT CAT GGGAGAC C ATT AC A 935 

I I M I I I I I I I I I I I I I I I I I I I I II I I I I I I I II II I I I I I I I II I II 
Db 939 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTGTGGGAGATCACTTCA 998 

Qy 936 GAGAGAT GCT GAT T AGT AAGT T CAG ACAAT ACT T CAAGT C C CT T ACAT C CT T CAGGAC AT 995 

I I I I I M I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 999 GGGACAT GCT GAT GAATCAACT GAGACACAACTT CAAAT CC CTT ACAT C CTTTAGCAGAT 1058 

Qy 996 GAGCT GCT GGAT GC AGGTCTT CACT CAGCCAAAA- T GAGACACTT GATAAACAG 104 8 

I I I I III I I I I I I I I I I II I I I I I I I I I I I I I II 

Db 1059 GGGCTCATGAACTCCTACTTTCATTCAGAGAAAAGTGAGGGGCTTGTGAAACAG 1112 



RESULT 11 
AAC81122 

ID AAC81122 standard; cDNA; 1385 BP. 
XX 

AC AAC81122; 
XX 

DT 14-FEB-2001 (first entry) 
XX 

DE Human secreted protein gene 37 SEQ ID NO: 47. 
XX 

KW Human; secreted protein; diagnosis; immunosuppressive; antiarthritic; 

KW antirheumatic; antiproliferative; cytostatic; cardiant; vasotropic; 

KW cerebroprotective; nootropic; neuroprotective; antibacterial; virucide; 

KW fungicide; ophthalmological ; vulnerary; gene therapy; autoimmune disease; 

KW hyperprolif erative disorder; cardiovascular disorder; angiogenesis ; 

KW cerebrovascular disorder; nervous system disorder; infection; skin aging; 

KW ocular disorder; wound healing; food additive; preservative; ss. 

XX 

OS Homo sapiens . 
XX 

PN WO200061628-A1. 
XX 

PD 19-OCT-2000. 
XX 

PF 06-APR-2000; 200 0WO-US09 07 0 . 
XX 

PR 09-APR-1999; 99US-0128695 . 

PR 14-JAN-2000; 200 OUS-017 6052 . 
XX 

PA (HUMA-) HUMAN GENOME SCI INC. 
XX 

PI Rosen CA, Ruben SM, Komatsoulis G; 
XX 

DR WPI; 2000-619228/59. 

DR P-PSDB; AAB45344 . 
XX 

PT New nucleic acid molecules encoding 49 human secreted proteins for 

PT diagnosing, preventing, treating or ameliorating medical conditions and 

PT used as food additives or preservatives - 

XX 

PS Claim 1; Page 412; 454pp; English. 
XX 

CC The polynucleotide sequences given in AAC81086 to AAC81134 encode the 

CC human secreted proteins given in AAB45308 to AAB45356. AAB45357 to 

CC AAB45384 represent human secreted polypeptide sequences and proteins 

CC homologous to them, which are given in the exemplification of the present 

CC invention. Human secreted proteins have activities based on the tissues 

CC and cells the genes are expressed in. Examples of activities include: 

CC antiarthritic; immunosuppressive; antirheumatic; antiproliferative; 

CC cytostatic; cardiant; vasotropic; cerebroprotective; nootropic; 

CC neuroprotective; antibacterial; virucide; fungicide; ophthalmological; 

CC and vulnerary. The polynucleotides and polypeptides can be used to 

CC prevent, treat or ameliorate a medical condition in e.g. humans, mice, 

CC rabbits, goats, horses, cats, dogs, chickens or sheep. They are also used 

CC in diagnosing a pathological condition or susceptibility to a 

CC pathological condition. Disorders which are diagnosed or treated include 



CC autoimmune diseases, hyperprolif erative disorders , cardiovascular 

CC disorders, cerebrovascular disorders, angiogenesis , nervous system 

CC disorders, infections caused by bacteria, viruses and fungi and ocular 

CC disorders . The polypeptides can also be used to aid wound healing and 

CC epithelial cell proliferation, to prevent skin aging due to sunburn, to 

CC maintain organs before transplantation, for supporting cell culture of 

CC primary tissues, to regenerate tissues and in chemotaxis. The 

CC polypeptides can also be used as a food additive or preservative to 

CC increase or decrease storage capabilities, fat content, lipid, protein, 

CC carbohydrate, vitamins, minerals, cof actors and other nutritional 

CC components. AAC81077 to AAC81085 and AAB45307 represent sequences used in 

CC the exemplification of the present invention. 

XX 

SQ Sequence 1385 BP; 385 A; 296 C; 275 G; 429 T; 0 other; 

Query Match 37.6%; Score 580.4; DB 21; Length 1385; 

Best Local Similarity 75.2%; Pred. No. 4.1e-137; 

Matches 7 63; Conservative 0; Mismatches 24 6; Indels 5; Gaps 3; 

Qy 3 9 GCAGAAT GGC AC AGAATT T AT CT T GT GAGAAT T G GT T GGCAAC AGAGGCT AT CT T GAATA 98 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 5 6 GGATCATGGCATGGAATGCAACTTGCAAAAACTGGCTGGCAGCAGAGGCTGCCCTGGAAA 115 

Qy 99 AGT ACT AC C T CT CT GCAT T T TAT GCAAT C GAGT T C AT TT T T GGACT GCT T G GGAAT GT CA 158 

I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I II 

Db 116 AGTACTACCTTTCCATTTTTTATGGGATTGAGTTCGTTGTGGGAGTCCTTGGAAATACCA 175 

Qy 159 CTGTGGTGTTCGGCTACCTCTTCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTT 218 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I II I 
Db 17 6 TTGTTGTTTACGGCTACATCTTCTCTCTGAAGAACTGGAACAGCAGT7\ATATTTATCTCT 235 

Qy 219 TTAACCTTTCCATCTCTGACTTTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTT 27 8 

I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I II I I I I I I I I I I I I 
Db 236 TTAACCTCTCTGTCTCTGACTTAGCTTTTCTGTGCACCCTCCCCATGCTGATAAGGAGTT 295 

Qy 27 9 AT GCCAAT GATAAGG GGAC CT AT GGAGAT GT T CT CT GT AT AAG C AAC C GAT AT GT GCT T C 338 

I I I II I I I I II III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 29 6 AT GCCAAT GGAAACT GGAT AT AT GGAGAC GT GCT CT GCAT AAG CAAC C GAT AT GT GCT T C 355 

Qy 33 9 ACACCAAC CT CT ACAC C AGCAT C CT CT T C CT C ACT T T CAT TAG CAT GGAC C GAT AT CT GC 398 

I MINIMI! MINIM I II I I I I I I I I I I II I I I I I II I I I I I II 
Db 35 6 AT GCCAAC CT CT AT AC C AGCAT T CT CT TT CT CACT T T TAT C AG C AT AGAT C GAT ACTT GA 415 

Qy 399 T CATGAAGTACCCTTT CC GAGAACACTTTCTACAAAAGAAGGAATTT GCCATTTTAAT CT 458 

I II I I I I I I I I I I I I I I I I I I I I I I I I M I II II I || | | | M I I I I I I I I I I 
Db 416 T AATTAAGT AT C CT T T C C GAGAAC AC CTT CT GCAAAAGAAAGAGTTT GCT ATT TTAAT CT 475 

Qy 4 59 C G CT GGCT GTCTGGGCCTTAGTGACCTTAGAAGTTCTACC CAT GCT CACT TT CAT C7VATT 518 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 47 6 CCTTGGCCATTTGGGTTTTAGTAACCTTAGAGTTACTACCCATACTTCCCCTTATAAATC 535 

Qy 519 CT GT C C CAAAAGAAGAGG GCAGT AACT GCAT C GACT AT GCAAGT TCT GGAAAC C CT GAAC 578 

MM II II I II I I Mill I I I I I I I I I I I II I I I I I I I I 
Db 536 CT GTTAT AACT GACAAT G GCAC C AC CT GT AAT GAT T T T GCAAGT TCT GGAGAC C C CAACT 595 



Qy 



57 9 ACAATCTCATTTACAGCCTCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGA 638 
II I I I I I I I I I I I I II I II IE II I I I I I II I I I II I II ! I I I I I II I I I 



Db 



596 ACAACCTCATTTACAGCATGTGTCTAACACTGTTGGGGTTCCTTATTCCTCTTTTTGTGA 655 



Qy 639 TGTGCTTCTTCTACTACAAGATGGTAGTCTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAA 698 

I I I I I I I II I I I 1 I I M I I I I I I I I I I I I I I I I I I I I I I Ml 

Db 656 TGTGTTTCTTTTATTACAAGATTGCTCTCTTCCTAAAGCAGAGGAATAGGCAGGTTGCTA 715 

Qy 699 CTGCCCTGCCACTGGACAAACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTA 758 

I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 716 CTGCTCTGCCCCTTGA7WVGCCTCTCAACTTGGTCATCATGGCAGTGGTAATCTTCTCTG 775 

Qy 759 TACTCTTCACACCCTATCATATCATGCGCAATTTGAGGATCGCCTCACGCCTGGATAGTT 818 

I I I II I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I 

Db 776 TGCTTTTTACACCCTATCACGTCATGCGGAATGTGAGGATCGCTTCACGCCTGGGGAGTT 835 

Qy 819 G GCCACAAGGATGTACACAGAAGGCCAT CAAAT CTATATACACACTGACACGGCCTC 875 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 836 G GAAGC AGTAT CAGT G CACT C AGGT C GT CAT C AACT C CT TT T ACATT GT GAC AC - GCCT T 894 

Qy 876 TGGCCTTTCT GAAC AGT GCC AT CAAT C C CAT CT T CT ACT T C CT CAT G GGAGAC CAT T AC A 935 

I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I II II I I I I I I I II I II 

Db 8 95 TGGCCTTTCTGAACAGTGTCATCAACCCTGTCTTCTATTTTCTTTTGGGAGATCACTTCA 954 

Qy 936 GAGAGAT GCT GAT T AGT AAGT T CAGACAATACT T CAAGT C C CT T AC AT C CT T C AGGAC AT 995 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 955 GGGACAT GCT GAT GAAT CAACTGAGACACAACTT CAAAT CCCTTACATCCTTTAGCAGAT 1014 

Qy 996 GAGCT GC T G GAT GC AGGT CT T CACT C AG C C AAAA- T GAGAC ACT T GATAAACAG 104 8 

I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1015 GGGCTCATGAACTCCTACTTTCATTCAGAGAAAAGTGAGGGGCTTGTGAAACAG 1068 



RESULT 12 
AAL43942 

ID AAL43942 standard; DNA; 1011 BP. 
XX 

AC AAL43942; 
XX 

DT 27-SEP-2002 (first entry) 
XX 

DE Human G protein-coupled receptor coding sequence. 
XX 

KW Human; gene therapy; G protein-coupled receptor; drug development; 
KW central nervous system disease; endocrine disease; metabolic disease; 
KW cancer; respiratory disease; digestive disease; immune disease; 
KW inflammation; infection; circulatory disease; gene; ds . 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 
FT CDS 1..1011 
FT /*tag= a 

FT /partial 

FT /product^ "Human G-protein coupled receptor" 

FT /note= "No stop codon is given" 

XX 

PN WO200257441-A1. 
XX 



PD 25-JUL-2002. 
XX 

PF 17-JAN-2002; 2002WO- JP00270 . 
XX 

PR 18-JAN-2001; 2001JP-00107 14 . 

PR 30-MAR-2001; 2 001JP-0102 4 84 . 
XX 

PA (TAKE ) TAKEDA CHEM IND LTD. 

. XX 

PI Miwa M, Ito T, Shintani Y, Miyajima N; 
XX 

DR WPI; 2002-566800/60. 

DR P-PSDB; AA015399. 
XX 

PT Human kidney-originated G protein-coupled receptor protein TGR30 and 

PT encoded DNA, for developing drugs to treat central nervous diseases, 

PT endocrine diseases, metabolic diseases and cancer, including gene 

PT therapy - 
XX 

PS Claim 6; Page 90-91; 98pp; Japanese. 
XX 

CC The invention comprises the amino acid and coding sequence of a human G 

CC protein-coupled receptor. The DNA and protein sequences of the invention 

CC are useful for developing drugs to prevent or treat (gene therapy) : 

CC central nervous system diseases; endocrine diseases; metabolic diseases; 

CC cancer; respiratory diseases; digestive diseases; immune diseases; 

CC inflammations; infections; and circulatory diseases. The present DNA 

CC sequence encodes the human G protein-coupled receptor of the invention. 
XX 

SQ Sequence 1011 BP; 257 A; 263 C; 188 G; 303 T; 0 other; 

Query Match 8.2%; Score 126.6; DB 24; Length 1011; 

Best Local Similarity 49.9%; Pred. No. 6.2e-22; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTT GT GAGAATT GGTT GGCAACAGAGGCT AT CTTGAAT AAGT ACTACCT CT CT GCATTTT 119 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I III 

Db 5 9 CTT T T GGAAAT T G C ACT GAT GAAAAC AT C C C ACTCAAGAT GC ACT AC CT C CCT GT T AT TT 118 

Qy 12 0 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 17 9 

III II I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 119 AT GGC AT T AT CT T CCT CGT GG GAT TT C C AG GCAAT G CAGT AGT GAT AT C C ACTT ACAT TT 17 8 

Qy 18 0 TCTGCATGAAGAACTGGAACAGCAGCAATGTCTATCTTTTTAACCTTTCCATCTCTGACT 239 

II I I I I I I I 11 I I I I I I I II I I I I I I I I II I I I I 

Db 17 9 T CAAAAT GAGAC CT T GGAAGAGCAG C AC CAT CATT AT GCT GAAC CT GGC CT GCAC AGAT C 2 38 

Qy 24 0 TTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT GATAAGGGGA 2 96 

I I III I I II I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 23 9 T GC T GT AT CT GAC C AG C CT C C C CT T CCT GAT T C AC TACT AT GCC AGT G GC GAAAAC T GGA 2 98 

Qy 2 97 C CT AT G GAGAT GT T CT CTGT ATAAGCAAC C GAT AT GT GCT T CAC AC CAAC CT CT AC ACCA 356 

I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I 

Db 2 99 T CT T T GGAGAT T T CAT GT GT AAGT TT AT C C GCT T CAGCT T C CAT T T CAAC CT GT AT AGC A 358 

Qy 357 GC AT CCTCTTCCT C ACT TT C AT T AGCAT GGAC C GAT AT CT G CT CAT GAAGT ACC CT T T CC 416 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 359 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCCAATGA 418 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 476 

I I I I I I I I II III II II II I I I I I I I 

Db 419 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 47 8 

Qy 477 T AGT GAC CT T AGAAGT T CT AC C CAT GCT CACT T T CAT CAATTCT GT C C CAAAAGAAGAGG 536 

I I I I I I I I I I I I I I I I I I I I I I I I 

Db 479 T T T CACT GGT AGCT GT CAT T C C GAT GAC CT T CT T GAT C ACAT CAAC CAACAGGAC CAAC A 538 

Qy 537 GC AGT AACT G CAT C GAC TAT G CAAGTT CT GGAAAC C CT GAACACAAT CT CAT T T AC AGC C 596 

I Ml I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 539 GAT C AG CCTGTCTC GAC C T C AC C AGT T C G G AT GAACT CAAT ACTATTAAGTGGT 592 

Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

I I I I I I I I I I I I I I I I I I I I I II 

Db 593 ACAACCTGATTTTGACTGCAACTACTTTCTGCCTCCCCTTGGTGATAGTGACACTTTGCT 652 

Qy 657 AGAT G GT AGT CTT CT TAAAGAGGAG GAGC C AGC AGCAAGCAACT GC C CT GC CACT GGAC A 716 

II I I I I II I I I I I I II III 

Db 653 AT AC CAC GAT T ATCCACACT CT GAC C CAT GGACT GCAAACT GAC AG CT GC CT T AAGC AGA 712 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 77 6 

I I I I I I II I I I I II Ml M M I I I I I I 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 

Qy 111 AT AT CAT GC GCAATT T GAGGAT C GC C TCAC GC CT G 811 

It I I I I I I I MM I I I I I I I I 

Db 773 AT AT CTT GAGGGT CAT T C GGAT C GAAT CT C GC CT G 8 07 



RESULT 13 
AAS07948 

ID AAS07948 standard; cDNA; 1014 BP. 
XX 

AC AAS07948; 
XX 

DT 23-OCT-2001 (first entry) 
XX 

DE Human cDNA encoding G-protein coupled receptor, hRUP21. 
XX 

KW Human; G-protein coupled receptor; GPCR; hRUP21; agonist; 

KW inverse agonist; lung cancer; ss. 

XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualif iers 

FT CDS 1. . 1014 

FT /*tag= a 

FT /product^ "hRUP21" 

XX 

PN WO200136471-A2. 
XX 

PD 25-MAY-2001. 
XX 

PF 16-NOV-2000; 2 000WO-US31509 . 
XX 
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XX 

PA (AREN-) ARENA PHARM INC. 
XX 

PI Chen R, Dang HT, Lowitz KP; 
XX 

DR WPI; 2001-355616/37. 

DR P-PSDB; AAU04375. 
XX 

PT Endogenous and non-endogenous versions of human G-protein coupled 

PT receptors for direct identification of candidate compounds as agonists, 

PT inverse agonists or partial agonists for use as therapeutic agents - 

XX 

PS Claim 55; Page 113-114; 159pp; English. 
XX 

CC The sequence encodes a human G-protein coupled receptor (GPCR) , 

CC hRUP21 The endogenous and non-endogenous, constitutively activated 

CC versions of human G-protein coupled receptors (GPCR), are useful for 

CC direct identification of candidate compounds as receptor agonists, 

CC inverse agonists or partial agonists having applicability as therapeutic 

CC agents for treating diseases related to GPCR, e.g. lung cancer. 

CC Non-endogenous version of human GPCRs are also utilized in research 

CC settings and in vitro and in vivo system, incorporating GPCRs can be 

CC utilised to elucidate and understand the roles these receptors 

CC play in the human condition, both normal and diseased. 

XX 

SQ Sequence 1014 BP; 258 A; 263 C; 189 G; 304 T; 0 other; 

Query Match 8.2%; Score 126.6; DB 22; Length 1014; 

Best Local Similarity 49.9%; Pred. No. 6.2e-22; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

I I I I I I I I I I I II I M I I I I I I I I II I I I III 

Db 59 C TT T T GGAAAT T GCACT GAT GAAAAC AT C C CACT CAAGAT GCACT AC CT C C CT GTT AT T T 118 

Qy 120 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 17 9 



Db 119 AT GGCATT ATCTT CCT CGT GGGATTT C CAGGCAAT GCAGTAGT GATAT CCACTTACATTT 17 8 

Qy 18 0 TCTGGATGAAGAACTGGAACAGCAGCTy^TGTCTATCTTTTTAACCTTTCCATCTCTGACT 239 

II | | | | I I I I I I I I I I II II II I I I I I I II I I I I 

Db 17 9 T CAAAAT GAGAC CT T GGAAGAGC AGCAC CAT CAT TAT GCT GPJKC C T GG C CT GCACAGAT C 238 

Qy 24 0 T T GCT T T C CT GT GC AC CCT T C C CAT CCT GATAAAGAGTT AT GC CAAT GATAAGGGGA 296 

I I III I I I I I I I I I I I I I I I I I I I I I I I I II M Ml 

Db 239 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGAAAACTGGA 2 98 

Qy 297 C CT AT GGAGAT GT T CT CT GT ATAAGCAAC C GATAT GT GCT T CAC ACCAAC CT CT ACAC C A 356 



299 TCTTTGGAGATTTCATGTGTAAGTTTATCCGCTTCAGCTTCCATTTCAACCTGTATAGCA 358 
357 GC AT CCTCTTCCT C ACTT T CAT T AGCAT G GACC GAT AT CT GCT CAT GAAGT ACC CT TT C C 416 



Db 359 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCCAATGA 418 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 47 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 419 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 47 8 

Qy ' 477 T AGT GAC CT T AGAAGT T CT AC C CATGC T C ACTT T CAT CAAT T CT GT C C CAAAAGAAGAG G 536 

I III II I II III I MINIM I I I 

Db 479 TT TC ACT GGT AGCT GT CAT T C C GAT GAC C T T CT T GAT CAC AT CAACCAACAGGAC CAACA 538 

Qy 537 GCAGT AACT GC AT C GACT AT GCAAGT T CT GGAAAC CCT GAAC ACAAT CT CATTT ACAGC C 596 

I I I I I I I I I I I I I I I I I I M I I I II I I I I I 

Db 539 GAT CAGC C T GT CT C GAC CT CAC C AGT T C GG AT GAACT CAATACTATTAAGT GGT 592 

Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

I I I I I I I I I I I I I I I I I I I I I II 

Db 593 ACAACCTGATTTT GACT GCAACTACTTTCTGCCTCCCCTT GGT GATAGTGACACTTT GCT 652 

Qy 657 AGAT GGT AGT CT T CT T AAAGAG GAGGAGC CAGC AGCAAGCAACT GC C CT GC CACT GGACA 716 

II I I I I I I I I I I I I II III 

Db 653 AT ACCAC GAT TAT C CAC AC T CT GACC CAT GGAC T GCAAACT GACAGCT GCCT TAAGC AGA 712 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 776 

I I I I I I I I I I II II III II II I I I I I I 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 

'Qy 777 AT AT CAT GC GCAAT T T GAGGAT C G C CT CAC GCC T G 811 

I I I I I II I I I I I I I I I I I I I I II 

Db 773 AT AT CT T GAGGGT CAT T CGGAT C GAAT CTCGCCTG 807 



RESULT 14 
ABZ42876 

ID ABZ42876 standard; DNA; 1014 BP. 
XX 

AC ABZ42876; 
XX 

DT 06-MAR-2003 (first entry) 
XX 

DE Human GPCR polynucleotide SEQ ID NO 13. 



XX 

KW Human; GPCR; G protein coupled receptor; signal transduction; olfactory; 

KW drug development; gustatory; taste; fragrance; gene; ds . 

XX 

OS Homo sapiens. 
XX 

PN WO200216548-A2. 
XX 

PD 28-FEB-2002. 
XX 

PF 30-JUL-2001; 2001WO-IB01446 . 
XX 

PR 04-AUG-2000; 2000 JP-0237818 . 

PR 13-FEB-2001; 2001JP-0034434 . 
XX 

PA (NISC-) JAPAN SCI & TECHNOLOGY CORP. 
XX 

PI Haga T, Takeda S, Mitaku S; 
XX 

DR WPI; 2002-304118/34. 

DR P-PSDB; ABP95602. 
XX 

PT Database global search for G protein-coupled receptors, proteins and 

PT encoded genes for studying in vivo signal transduction mechanism and 

PT identifying targets for drug development 
XX 

PS Claim 9; SEQ ID NO 13; 97pp + Sequence Listing; Japanese. 
XX 

CC The invention relates to a method for screening G protein-coupled 

CC receptor (GPCR) genes (ABZ42870-ABZ43216) and/or GPCR proteins 

CC (ABP95596-ABP95942) by extracting open-reading frames containing 6-8 

CC transmembrane domains with 250-1000 amino acid residues to give a gene 

CC homologous with a known GPCR gene. The receptor proteins and encoded 

CC genes are useful for studying in vivo signal transduction mechanism and 

CC identifying targets for drug development e.g. based on olfactory and 

CC gustatory receptors in form of agonists and antagonists by screening 

CC intrinsic and extrinsic ligands as bitter taste inhibitors, taste 

CC enhancers and fragrance improvers . 

CC Note: The sequence data for this patent did not form part of the printed 

CC specification, but was obtained in electronic format directly from WIPO 

CC at ftp.wipo.int/pub/published_pct_sequences. 
XX 

SQ Sequence 1014 BP; 258 A; 263 C; 189 G; 304 T; 0 other; 

Query Match 8.2%; Score 126.6; DB 24; Length 1014; 
Best Local Similarity 49.9%; Pred. No. 6.2e-22; 

Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Qy 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

I I I I I I I I I I I II I I I I I I I II I I I I I I I Ml 

Db 59 CT T T T GGAAAT T GCACT GAT GAAAAC AT C C C ACT CAAGAT GCACT AC C T C C CT GT T AT TT 118 

Qy 120 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 179 

Ml M I I I I I I I I I I I I I I I I I I I I i I I I I I I 

Db 119 ATGGCATTATCTTCCTCGTGGGATTTCCAGGCAATGCAGTAGTGATATCCACTTACATTT 178 



Qy 180 T CT GC AT GAAGAACT G GAAC AGC AGCAAT GT CT AT CT T TT TAACC T TT C CAT C T CT GACT 239 



II I II I 1 1 1 1 1 1 1 1 1 1 1 1 II II I 1 1 1 1 1 II I I 1 1 

Db 17 9 T CAAAAT GAGAC CT T G GAAGAGCAGC AC CAT CAT TAT GCT GAACCT GGCCT G C AC AGAT C 238 

Qy 24 0 TTGCTTTCCTGTGCACCCTTCCCATCCTGATAAAGAGTTATGCCAAT GATAAGGGGA 2 96 

I | I M I I I I I I II I I II I II I I I I I I I I I M I I I I I 

Db 239 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGAAAACTGGA 2 98 

Qy 2 97 C CT AT GGAGAT GTT CT CT GT ATAAGCAAC C GAT AT GT GCTT CACACCAAC CT CT ACAC C A 356 

I I I I M II I I I I I I I I I I I I Ml I I I I I I I I I I I 

Db 299 T CTTT GGAGATTT CAT GT GTAAGTTT AT C CGCTTCAGCTTCCATTTCAACCT GTAT AGCA 358 

Q y 357 GC AT CCTCTTCCT CACT TT CAT TAG CAT GGAC CGATAT CT G CT CAT GAAGT AC C CTT T C C 416 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 

Db 359 GC AT CCTCTTCCT CAC CT GT T T CAGCAT CT T CCGCT ACT GT GT GAT CAT T CAC C CAAT GA 418 

Qy 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTTAATCTCGCTGGCTGTCTGGGCCT 476 

| | I I I I I I II III II II II I II I I I I 

Db 419 GCTGCTTTTCCATTCACAAAACTCGATGTGCAGTTGTAGCCTGTGCTGTGGTGTGGATCA 478 

Qy 477 T AGT GAC CT T AGAAGT T CT AC C CAT GCT CACTTT CAT CAAT T CT GT C C CAAAAGAAGAGG 536 

I I I I I I I I I I I I I I I M I I I I I I I 

Db 47 9 T T T CAC T G GT AG C T GT CAT T C C GAT GAC C T T C T T GAT CAC AT C AAC C AAC AG GAC C AAC A 538 

Qy 537 GC AGT AACT GCAT C GAC TAT GCAAGTT CT GGAAAC C CT GAACACAAT CT C AT TT ACAGC C 596 

I Ml I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 539 GATCAGCCTGTCTCGACCTCACCAGTTCGG AT GAACT CAAT AC TAT T AAGT GGT 592 

Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

I I I I I I I I I I I I I I I I I I I I I II 

Db 593 ACAACCTGATTTTGACTGCAACTACTTTCTGCCTCCCCTTGGTGATAGTGACACTTTGCT 652 

Qy 657 AGAT GGT AGT CTTCTTAAAGAGGAGGAGCCAGCAGCAAGCAACT GCCCT GCC ACT GGACA 716 

II I I I I II I I I I I I II Ml 

Db 653 AT AC CAC GAT TAT CCACACT CT GACCCATGGACT GCAAACT GAC AGCT GC CT TAAGC AGA 712 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 77 6 

I I I I I I I I I I I I I I I I I I I I 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 

Qy 777 ATATCATGCGCAATTTGAGGATCGCCTCACGCCTG 811 

I I I I I I I I I I I I I I I I I I I I I I I 

Db 773 AT AT CTT GAGGGT CATT C GGAT C GAAT CTCGCCTG 8 07 



RESULT 15 
ABN85630 

ID ABN85630 standard; DNA; 1014 BP. 
XX 

AC ABN85630; 
XX 

DT 18-SEP-2002 (first entry) 
XX 

DE Human P2Y-like receptor variant encoding gene SEQ ID NO 3. 
XX 

KW Human; Py2-like receptor; HIPHUM 0000037; immunity; inflammation; 

KW cancer; Crohn's disease; irritable bowel syndrome; rheumatoid arthritis; 

KW immunomodulator ; anti-inflammatory; cytostatic; antiasthmatic; 



KW gastrointestinal; anti-ulcer; antirheumatic; antiarthritic; virucide; 

KW antibacterial; immunosuppressive; dermatological ; nephrotropic; 

KW antiallergic; analgesic; receptor; gene; ds . 
XX 

OS Homo sapiens . 
XX 

FH Key Location/Qualifiers 

FT CDS 1..1014 

FT /*tag= a 

FT /product= "P2Y-like receptor variant" 
XX 

PN GB2369364-A. 
XX 

PD 29-MAY-2002. 
XX 

PF 31-AUG-2001; 2001GB-0021215 . 
XX 

PR 01-SEP-2000; 2000GB-0021524 . 

PR 06-SEP-2000; 2000GB-0021894. 

PR 25-SEP-2000; 2000GB-0023444 . 
XX 

PA ( GLAX ) GLAXO GROUP LTD. 
XX 

PI Foord SM, Ignar DM; 
XX 

DR WPI; 2002-511268/55. 

DR P-PSDB; ABB83819. 
XX 

PT An isolated P2Y-like receptor polypeptide (HIPHUM 0000037) which can be 

PT used for the identification of agonists and antagonists which may be 

PT used to treat an immune or inflammatory disease - 
XX 

PS Claim 5; Page 28-29; 35pp; English. 
XX 

CC The invention relates to an isolated P2Y-like receptor polypeptide 

CC (ABB83818-ABB83819) which is also referred to in the specification as 

CC HIPHUM 0000037. An effective amount of a substance (agonist or 

CC antagonist) which modulates P2Y receptor activity is useful to treat a 

CC subject having a disorder that is responsive to P2Y-like receptor 

CC modulation. The disorder is a disease of immunity or inflammation. The 

CC substance may also be used to manufacture a medicine for the treatment or 

CC prophylaxis of a disorder that is responsive to stimulation or modulation 

CC of P2Y-like receptor activity. Disorders which may be treated include 

CC colon cancers, asthma, COPD, Crohn 1 s disease, irritable bowel syndrome, 

CC gastroenteritis and colitis, inflammatory bowel syndrome, ulcerative 

CC colitis, rheumatoid arthritis, viral diseases, bacterial infections, 

CC autoimmune diseases, dermatitis, glomerulonephritis allergies, allergic 

CC rhinitis, inflammatory pain and general inflammation such as tendonitis, 

CC polymyositis or prostatitis. The invention provides alternative 

CC substances for the treatment of immunological and inflammatory diseases. 

CC The present sequence is that the P2Y-like receptor variant encoding gene 

CC of the invention. 

XX 

SQ Sequence 1014 BP; 258 A; 263 C; 189 G; 304 T; 0 other; 



Query Match 8.2%; Score 126.6; DB 24; Length 1014; 

Best Local Similarity 49.9%; Pred. No. 6.2e-22; 



Matches 377; Conservative 0; Mismatches 369; Indels 9; Gaps 2; 

Q y 60 CTTGTGAGAATTGGTTGGCAACAGAGGCTATCTTGAATAAGTACTACCTCTCTGCATTTT 119 

| | | I I Mill I II I I I I I I II I I I I I I I I IN 

D b 59 CTT T T GGAAATT GC ACT GAT GAAAAC AT C C CACT CAAGAT GC AC T AC CT CCCT GT TAT T T 118 

Qy 12 0 ATGCAATCGAGTTCATTTTTGGACTGCTTGGGAATGTCACTGTGGTGTTCGGCTACCTCT 17 9 

IN || III I I II I I I I I II II II I I I I I M I I 

D b 119 AT GGC AT T AT CTT C CT C GT GGGAT T T C C AGGCAAT GC AGT AGT GAT AT C C ACT T AC AT T T 17 8 

Qy 18 0 T CT G CAT GAAGAACT GGAAC AGC AGCAAT GT CT AT CT TT T TAAC CT T T C CAT CT CT GACT 239 

|| | | | I I I I I I II I I II I II II I II I I I M I I I I 

Db 17 9 T CAAAAT GAGACCTT G GAAGAGCAGC AC CAT CAT T AT GCT GAAC CT GG C CT GC ACAGAT C 238 

Qy 24 0 T T GCT T T C CT GT GC AC C CT T C CC AT C CT GAT AAAGAGT T AT GC CAAT GATAAGGGGA 296 

| | | || II I M I II M M I I M I I I I I 

Db 239 TGCTGTATCTGACCAGCCTCCCCTTCCTGATTCACTACTATGCCAGTGGCGT^AAACTGGA 298 

Qy 2 97 C CT AT GGAGAT GTT CT C T GT ATAAG CAAC C GAT AT GT GCT T CAC AC CAAC CT CT AC AC CA 356 

I I II I I I II I Ml I I I II I M I M 

Db 299 T CTT T GG AGAT TT C AT GT GT AAGT T TAT CC GCTT CAGCT T C CAT T T CAAC CT GTAT AGCA 358 

Q y 357 G CAT C CT CT T C CT C AC T T T CAT T AGC AT GGAC C GAT AT CT GCTC AT GAAGT ACC C TT T C C 416 

II I M II II II II II I I I M I M I I I I I I I I I I'll I 

Db 359 GCATCCTCTTCCTCACCTGTTTCAGCATCTTCCGCTACTGTGTGATCATTCACCCAATGA 418 

Q y 417 GAGAACACTTTCTACAAAAGAAGGAATTTGCCATTTT7UVTCTCGCTGGCTGTCTGGGCCT 47 6 

| | | || M I II M I M M II I II I I I I 

Db 419 GCT GCTT TT C CATTCACAAAACT CGAT GT GCAGTT GTAGCCT GT GCT GT GGT GT GGAT CA 478 

Qy 477 T AGT GAC CTT AGAAGT T CT AC C CAT GCT C ACT TT C AT CAAT T CT GT C C CAAAAGAAGAGG 536 

I Ml II I II III I I I I I II II I I I 

Db 479 T TT CACT GGT AGCT GT CAT T C C GAT GAC CT T CTT GAT CAC AT CAAC CAAC AG GAC CAACA 538 

Qy 537 GCAGTAACT GCAT CGACTAT GCAAGTT CT GGAAACCCT GAACACAAT CT CATTTACAGCC 596 

I II I I II II I II I I I I I I I I I II II M I I I 

Db 539 GATCAGCCTGTCTCGACCTCACCAGTTCGG ATGAACTCAATACTATTAAGTGGT 592 

Qy 597 TCTGCCTGACTTTGTTGGGCTTCCTAATTCCTCTCTCTGTGATGTGCTTCTTCTACTACA 656 

III II I I M I I M II I I I 

Db 593 ACAAC CT GAT T T T GACT GCAACT ACT TT CTGCCTCCCCTTG GT GAT AGT GACACT T T GCT 652 

Qy 657 AGAT G GT AGT CTT CTT AAAGAGGAGGAG C C AGCAGCAAGCAACT GC C CT GC CACT GGAC A 716 

|| III I M I II M I I I IM 

Db 653 ATACCACGATTAT CCACACT CT GACCCAT GGACT GCAAACTGACAGCT GC CTTAAGCAGA 712 

Qy 717 AACCCCAACGCCTGGTGGTCCTGGCGGTTGTGATCTTCTCTATACTCTTCACACCCTATC 77 6 

I I I I I I I I I M I II Ml M II MIMI 

Db 713 AAGCACGAAGGCTAACCATTCTGCTACTCCTTGCATTTTACGTATGTTTTTTACCCTTCC 772 

Qy 777 AT AT CAT G C G CAAT T T GAGGAT C GC CTCACG C CT G 811 

II I II II I I MIMI II I I II II 

Db 773 AT AT C T TGAG GGT CAT T C G GAT C GAAT CT C GC CT G 807 
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